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FIELD OF THE INVENTION 

The present invention relates to animal models useful for the study of lipid 
metabolism that have been genetically modified to express or mis-express proteins involved 
in the sterol regulatory element binding protein (SREBP) pathway. The invention also 
relates to novel SREBP pathway nucleic acid and polypeptide sequences and their uses. 

BACKGROUND OF THE INVENTION 

Triglycerides, phospholipids, and cholesterol, which form the three major classes of 
lipid, perform a variety of necessary functions in cell metabolism and are vital constituents 
of biological membranes. However, elevated levels of lipids and/or improper lipid 
metabolism have been implicated in a variety of health disorders. Of particular concern is 
increased blood cholesterol which leads to atherosclerosis (the deposition of cholesterol on 
arterial walls). This in turn may lead to heart disease, stroke or other disorders of the 
circulatory system. Accordingly, there is much interest within the pharmaceutical industry 
to understand the mechanisms involved in cholesterol synthesis and metabolism, 
particularly on the molecular level, so that blood cholesterol lowering drugs can be 
developed for the treatment or prevention of atherosclerosis. 

Recent advances have been made in understanding some of the mechanisms 
involved in mammalian lipid metabolism. A key component is the sterol regulatory 
element binding protein (SREBP) pathway. SREBPs are transcription factors that activate 
genes involved in cholesterol and fatty acid metabolism. In the cholesterol biosynthetic 
pathway of vertebrates, SREBPs directly activate transcription of the genes encoding 
3-hydroxy-3-methylglutaryl (HMG) coenzyme A synthase, HMG-CoA reductase, farnesyl 
diphosphate synthase, and squalene synthase. In the fatty acid and triglyceride biosynthetic 
pathways, the direct targets of SREBPs include fatty acid synthase, acetyl-CoA carboxylase, 
glycerol-3 -phosphate acyltransferase, and acyl-CoA binding protein. Additionally, SREBPs 
modulate transcription of stearoyl Co A desaturase-1 and lipoprotein lipase. SREBPs also 
directly activate transcription of the gene encoding the low density lipoprotein (LDL) 
receptor, which provides cholesterol and fatty acids through receptor-mediated endocytosis. 
SREBPs are also implicated in the process of fat cell differentiation and adipose cell gene 
expression, particularly as transcription factors that can promote adipogenesis in a dominant 
fashion (reviewed by Spiegelman et al, Cell (1996) 87:377-389). 



- 1 - 



In high sterol conditions, SREBPs are retained as membrane-bound protein 
precursors that are kept inactive by virtue of being attached to the nuclear envelope and 
endoplasmic reticulum (ER) and therefore, excluded from the nucleus. As depicted in 
Figure 1 A, an SREBP in its membrane-bound form has large N-terminal and C-terminal 

5 segments facing the cytoplasm and a short loop projecting into the lumen of the organelle. 
The N-terminal domain is a transcription factor of the basic-helix-loop-helix-leucine zipper 
(bHLH-Zip) family, and contains an "acid blob" typical of many transcriptional activators. 
(Brown and Goldstein, Cell (1997) 89:331-340) 

The N-terminal acid blob is followed by a basic helix-loop-helix/leucine zipper 

10 domain (bHLH-Zip) similar to those found in many other DNA-binding transcriptional 
regulators. bHLH-Zip domains have two functions: the helix-loop-helix subdomain 
mediates dimerization, and the basic region binds to specific DNA sequences that include a 
direct repeat of 5 f -PyCAPy-3\ SREBP binds to the sequence 5'-ATCACCCCAC-3' which 
is known as "sterol regulatory element 1" (SRE-1) and is upstream of the LDL receptor 

15 gene, 

SREBPs are unique among bHLH-Zip proteins by virtue of the C-terminal domains 
attached to the bHLH-Zip domain. These include (from - to C -terminus): (1) a 
hydrophobic membrane-spanning sequence of about 20 amino acids, (2) a hydrophilic 
stretch of about 31 amino acids that projects into the lumen of the ER, (3) a second 
20 hydrophobic membrane-spanning domain of about 20 amino acids, and (4) a C-terminal 
domain which, in vertebrates, has been determined to be required for sterol regulation of 
SREBP cleavage. 

In low sterol conditions, the acid blob/bHLH-Zip domain of SREBP is released 
from the membrane after which it is rapidly translocated into the nucleus and binds specific 

25 DNA sequences to activate transcription. Two sequential proteolytic cleavages are 

involved. Referring to Figure IB, a first protease, referred to as the site 1 protease (SIP) 
cleaves SREBP at approximately the middle of the lumenal loop. SIP has been cloned 
from Chinese hamster ovary (CHO) cells (GI (GenBank Identifier No. (hereinafter "GI") 
3892203) and a human cell line (GI4506774) (Sakai et al 9 J. Biol. Chem (1998) 

30 273:5785-5793), and encodes a membrane bound glycoprotein of 1052 amino acids with 
subtilisin-like sequence features. 

After cleavage at site 1, a second protease (the site 2 protease, S2P) cleaves the 
N-terminal fragment and releases the mature N-terminal domain into the cytosol, from 
which it rapidly enters the nucleus, apparently with a portion of the transmembrane domain 

35 still attached at the C-terminus. Mature, transcriptionally active SREBP is rapidly degraded 
in a proteosome-dependent process. This combination of proteolytic processing and rapid 



turnover allows the SREBP system to rapidly respond to changes in cellular membrane 
components. S2P homologues have been identified in both vertebrates and invertebrates 
and have been cloned from human cells and hamster cells (Rawson et al , Molec Cell 
(1997) 1:47-57). It is a membrane protein containing an HEXXH sequence characteristic of 

5 zinc metalloproteases. This family of proteins has high hydrophobicity throughout the 
amino acid sequence, suggesting the existence of several membrane-spanning regions. 

A third component of the processing system for SREBPs is called SREBP Cleavage 
Activating Protein (SCAP). SCAP is a large transmembrane protein that activates SIP in 
low-sterol conditions. The N-terminal 730 amino acids have alternating hydrophobic and 

10 hydrophilic sequences which are predicted to form up to eight membrane spanning 

sequences separated by short hydrophilic stretches. This domain is strikingly similar to a 
domain of HMG CoA reductase (Hua et al, Cell (1996) 87:415-426) which is necessary to 
impart sterol regulation. In low sterol conditions, HMG-CoA reductase is quite stable, but 
when sterols are added the enzyme is rapidly degraded. It is believed that the membrane- 

15 spanning domain in SCAP, like its counterpart in HMG CoA reductase, can sense the levels 
of sterol in the ER membrane, either directly or indirectly. 

The C-terminal domain of SCAP is hydrophilic and is made up of about 550 amino 
acids organized into four WD repeats. Recent work has demonstrated that these WD 
repeats bind directly to the C-terminal regulatory domain of SREBP suggesting that SCAP 

20 and SREBP are part of a stable complex in the membrane of the ER (Sakai et al , supra). It 
is likely that SIP and perhaps S2P are also part of the complex since SCAP is essential for 
activation of SIP activity. This SREBP processing complex is depicted in Figure 2. 

The involvement of the SREBP pathway in the regulation of cholesterol metabolism 
is of interest not only because excess blood cholesterol can lead to atherosclerosis, but also 

25 because there seem to be parallels between the processing of SREBPs and the processing of 
P-amyloid precursor protein which has been implicated in Alzheimer's disease (Brown and 
Goldstein, supra). To date, the SREBP pathway has been studied primarily using 
mammalian cell culture, by the isolation of mutant cells that are defective in regulation of 
cholesterol metabolism or intracellular cholesterol trafficking. The mutants can then serve 

30 as hosts for cloning genes by functional complementation. This has led to the molecular 
cloning of the SIP, S2P and SCAP genes (Rawson et al, supra; Hua et al, supra; and 
Goldstein et al, US Pat. Nos. 5,527,690 and 5,891,631). 

Some SREBP pathway genes have been identified in invertebrates. The isolation of 
zDrosophila SREBP, referred to as "HLH106" (GI079656) has been described (Theopold 

35 et aL, Proc. Natl. Acad. Sci., USA, (1996) 93(3): 1 195-1199). An expressed sequence tag 
(EST) from C. elegans which has homology to S2P is described by Rawson et al, supra 
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and is listed in GenBank (GI1559384). Additionally, GenBank has listed a protein 
predicted from the C. elegans genome as having HMG-CoA reductase homology 
(GI3875380). 

5 SUMMARY OF THE INVENTION 

The use of invertebrate model organism genetics can greatly facilitate the 
elucidation of biochemical pathways, and the identification of molecules that can modulate 
such pathways. Accordingly, it is an object of the invention to provide invertebrate nucleic 
acids and polypeptides involved in the SREBP pathway. It is also an object of the invention 
10 to provide invertebrate model organisms, including novel mutant phenotypes, for the study 
of lipid metabolism in general, and more particularly, for the elucidation of the SREBP 
pathway. It is a further object of the invention to provide methods for screening molecules 
that modulate lipid metabolism and/or the function of genes and proteins involved in the 
SREBP pathway. 

15 These and other objects are provided by flies and nematodes that have been 

genetically modified to express or misexpress an SREBP pathway gene, for example using 
transposon mutagenesis, RNA interference, chemical mutagenesis, or other genetic 
techniques. In certain embodiments, expression of the SREBP pathway protein is driven by 
a heterologous promoter that is tissue-specific, developmentally-specific, or inducible, so 

20 that the effects of the expression or mis-expression can be observed in specific tissues, at 
certain developmental stages, or at specified times, respectively. Additionally, the SREBP 
pathway protein may be linked to one or more selectable markers that allows detection of 
expression. Typically, the expression of the SREBP pathway protein results in an 
identifiable phenotype. In the case of nematodes, the invention provides novel methods for 

25 the in vivo measurement of lipid content using BODIPY-fatty acid conjugates. The animal 
models can be used in genetic screens to identify other genes involved in lipid metabolism. 
They can also be used for screening small molecule libraries directly on whole organisms 
for possible therapeutic or pesticide use. 

The invention also provides novel isolated nucleic acids (SEQ ID NOs:l, 3, and 5) 

30 a*ul the SREBP pathway proteins encoded thereby (SEQ ID NOs:2, 4, and 6, respectively), 
as well as derivatives and fragments thereof Methods are provided for constructing vectors 
containing the isolated nucleic acids. Such vectors can be used for making the animal 
models of the invention. They can also be introduced into host cells to be used for a variety 
of purposes including two-hybrid screening assays, production of SREBP pathway proteins, 

35 screening small molecules that affect lipid synthesis or metabolism, etc. 



BRIEF DESCRIPTION OF THE DRAWINGS 
Figs. 1A and IB depict the inactive, membrane-bound form of SREBP (Fig. 1A) 
and the two-step proteolytic cleavage which activates SREBP in low sterol conditions (Fig. 
IB). 

5 Fig, 2 depicts the presumed interactions between SREBP, SCAP, SIP and S2P in 

the SREBP processing complex. 

Figs. 3A-3E show a cDNA sequence that encodes C. elegans SREBP (SEQ ID 

NO:l). 

Fig. 4 shows the predicted amino acid sequence (SEQ ID NO:2) of the polypeptide 
10 encoded by the C. elegans SREBP gene. 

Figs. 5A-5C show a cDNA sequence that encodes Drosophila S2P (SEQ ID NO:3). 
Fig. 6 shows the predicted amino acid sequence (SEQ ID NO:4) of the polypeptide 
encoded by the Drosophila S2P gene. 

Figs. 7A-7F show the cDNA sequence that encodes Drosophila SCAP (SEQ ID 

15 NO:5). 

Fig. 8 shows the predicted amino acid sequence (SEQ ID NO: 6) of the polypeptide 
encoded by the Drosophila SCAP gene. 

Figs. 9A-9E show the nucleic acid sequence encoding Drosophila SREBP 
(GI079656; SEQ ID NO:7). 
20 Fig. 10 shows the predicted amino acid sequence (SEQ ID NO:8) of Drosophila 

SREBP. 

DETAILED DESCRIPTION OF THE INVENTION 

The use of invertebrate model organism genetics and related technologies can 
25 greatly facilitate the elucidation of biological pathways (Scangos, Nat. Biotechnol. (1997) 
15:1220-1221; Margolis and Duyk, Nat. Biotechnol. (1998) 16:31 1). Of particular use are 
the insect and nematode model organisms, Drosophila melanogaster, and C. elegans. An 
extensive search for SREBP pathway nucleic acids and their encoded proteins in C. elegans 
and Drosophila melanogaster was conducted in an attempt to identify new and useful tools 
30 for probing the function and regulation of the SREBP pathway. Novel SREBP pathway 
nucleic acids and their encoded proteins are identified herein. As used in this description, 
the term "SREBP pathway nucleic acid" refers to a nucleic acid that encodes any one of 
SREBP, SCAP, SIP, and S2P. The newly identified SREBP pathway nucleic acids have 
led to the discovery of several mutant phenotypes that can be used to study the pathways 
35 involved in lipid and fatty acid metabolism. The use of invertebrate model organisms such 
as Drosophila melanogaster and C. elegans for analyzing the expression and 
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mis-expression of SREBP pathway proteins has great advantages over the traditional 
approach of using mammalian cell culture due to the ability to rapidly carry out large-scale, 
systematic genetic screens as well as the ability to screen small molecule libraries directly 
on whole organisms. Thus, the invention provides a superior approach for identifying other 

5 components involved in the synthesis, activation, control and turnover of SREBP pathway 
proteins. Systematic genetic analysis of the SREBP pathway using invertebrate model 
organisms can lead to the identification of new drug targets, therapeutic agents, diagnostics 
and prognostics useful in the treatment of disorders associated with lipid metabolism. 
Additionally, use of these invertebrate model organisms could lead to the identification and 

10 validation of pesticide targets directed to components of the SREBP pathway. 

The details of the conditions used for the identification and/or isolation of each 
novel SREBP pathway nucleic acid and protein are described in the Examples section 
below. Various non-limiting embodiments of the invention and applications and uses of 
these novel C. elegans and Drosophila melanogaster SREBP pathway genes and proteins 

1 5 are discussed in the following sections. The entire contents of all references cited herein are 
incorporated by reference in their entireties for all purposes. Additionally, the citation of a 
reference in the preceding background section is not an admission of prior art against the 
claims appended hereto. 

20 Nucleic acids of the SREBP pathway 

The invention relates generally to nucleic acid sequences of the SREBP pathway, 
and more particularly SREBP pathway nucleic acid sequences of C. elegans and Drosophila 
melanogaster, and methods of using these sequences. As described in the Examples below, 
the present invention provides a nucleic acid sequence (SEQ ID NO:l) that was isolated 

25 from C. elegans and encodes an SREBP homologue referred to herein as "ceSREBP". The 
invention also provides nucleic acid sequences that were isolated from Drosophila 
melanogaster and encode homologues of S2P (dS2P; SEQ ID NO:3) and SCAP (dSCAP; 
SEQ ID NO:5). In addition to the fragments and derivatives of SEQ ID NOs 1, 3, and 5, as 
described in detail below, the invention includes the reverse complements thereof. Also, the 

30 subject nucleic acid sequences, derivatives and fragments thereof may be RNA molecules 
comprising the nucleotide sequence of any one of SEQ ID NOs 1, 3, and 5 (or derivative or 
fragment thereof) wherein the base U (uracil) is substituted for the base T (thymine). The 
DNA and RNA sequences of the invention can be single- or double-stranded. Thus, the 
term "nucleic acid sequence", as used herein, includes the reverse complement, RNA 

35 equivalent, DNA or RNA double-stranded sequences, and DNA/RNA hybrids of the 
sequence being described, unless otherwise indicated explicitly or by context. 
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Fragments of these sequences can be used for a variety of purposes, for example, as 
nucleic acid hybridization probes and replication/amplification primers. Certain "antisense" 
fragments, i.e. that are reverse complements of the sequences set forth in any one of SEQ ID 
NOs: l s 3, and 5, have utility in inhibiting the function of SREBP pathway proteins. The 

5 fragments are of length sufficient to specifically hybridize with the corresponding SEQ ED 
NO 1, 3, or 5. In particular, the invention provides fragments of at least 12, preferably at 
least 24, more preferably at least 36, and more preferably at least 96 contiguous nucleotides 
of any one of SEQ ID NOs: 1, 3, and 5. In some embodiments, fragments of at least 200 or 
500 nucleotides may be preferred. When the fragments are flanked by other nucleic acid 

10 sequences, the total length of the combined nucleic acid sequence is less than 15 kb, and 
preferably less than lOkb, more preferably less than 2 kb, and in some embodiments, more 
preferably less than 500 bases. 

Preferred fragments of ceSREBP (SEQ ID NO:l) include those having at least 535 
contiguous nucleotides of SEQ ID NO: 1, and more preferably at least 540 nucleotides. In 

15 another embodiment of the invention, a fragment contains approximately residues 1090 to 
1290 of SEQ ID NO:l, which encodes a bHLH-Zip domain. Other preferred fragments 
comprise any one of the following contiguous sequences of SEQ ID NO: 1 : nucleotides 
1-85, 70-90, 76-218, 203-223, 208-528, 513-533, 517-637, 623-643, 626-1058, 1043-1063, 
1048-1293, 1279-1299, 1277-1486, 1473-1493, 1477-2016, 2002-2022, 2004-2413, 2399- 

20 2419, 2404-2641, 2627-2647, 2632-2795, 2781-3001, 2786-3156, 3142-3162, and 
3147-3397. 

Preferred fragments of dS2P (SEQ ID NO:3) include those having at least 1226 
contiguous nucleotides of SEQ ID NO:3, and more preferably at least 1231 nucleotides. 
Other preferred fragments comprise any one of the following contiguous sequences of SEQ 
25 ID NO:3: nucleotides 5-296, 281-301, 287-734, 719-739, and 725-1958. 

Preferred fragments of dSCAP (SEQ ID NO:5) include those having at least 2274 
contiguous nucleotides of SEQ ID NO:5, and more preferably at least 2279 nucleotides. 
Other preferred fragments comprise any one of the following contiguous sequences of SEQ 
IDNO:5: nucleotides 1-160, 150-170, 151-544, 529-549, 526-719, 704-724, 711-2988, 
30 2974-3004, 2981-3191, 3177-3197, 3182-3546, 3532-3552 and 3537-3765. 

Additionally, fragments of any of the foregoing sequences that are double-stranded 
RNA (dsRNA) molecules have utility in RNA interference (RNAi) studies, as described in 
more detail below, where model organisms exhibiting loss-of- function phenotype are 
generated. Typically, dsRNA molecules for RNAi studies are from about 200 to 2000 bp, 
35 and are preferably 600-900 bp in size. 



The subject nucleic acid sequences may consist solely of any one of SEQ ID NOs:l, 
3, and 5, or fragments thereof. Alternatively, the subject nucleic acid sequences and 
fragments thereof may be joined to other components such as labels, peptides, agents that 
facilitate transport across cell membranes, hybridization-triggered cleavage agents or 

5 intercalating agents. The subject nucleic acid sequences and fragments thereof may also be 
joined to other nucleic acid sequences (i.e. they may comprise part of larger sequences) and 
are of synthetic/non-natural sequences and/or are isolated, i.e. unaccompanied by at least 
some of the material with which it is associated in its natural state. Preferably, the isolated 
nucleic acids constitute at least about 0.5%, and more preferably at least about 5% by 

10 weight of the total nucleic acid present in a given fraction, and are preferably recombinant, 
meaning that they comprise a non-natural sequence or a natural sequence joined to 
nucleotide(s) other than that which it is joined to on a natural chromosome. 

The invention also provides derivative nucleic acid sequences which hybridize to the 
nucleic acid sequence of any one of SEQ ID NOs:l, 3, and 5 under stringency conditions 

15 such that each hybridizing derivative nucleic acid is related to the subject nucleic acid by a 
certain degree of sequence identity. In a specific embodiment, the derivative nucleic acid 
hybridizes to the reverse complement of SEQ ID NO:l, 3 or 5 and has the antigenicity of a 
polypeptide encoded by SEQ ID NO:2, 4, or 6, respectively. The temperature and salt 
concentrations at which hybridizations are performed have a direct effect on the results that 

20 are obtained. With "stringent" or "high stringency" conditions, a denaturing agent, such as 
formamide, is used during hybridization. The formamide is typically used at 25% to 50% 
(v/v) in a buffered diluent comprising IX to 6X SSC (IX SSC is 150 mM NaCl and 15mM 
sodium citrate; SSPE may be substituted for SSC, IX SSPE is 150mM NaCl, 10 mM Na 
H 2 P0 4 , and 1.25 mM EDTA, pH7.4). The hybridization temperature is typically about 

25 42 °C. High stringency conditions also employ a wash buffer with low ionic strength, such 
as 0.1X to about 0.5X SSC, at relatively high temperature, typically greater than about 
55 °C up to about 70°C. Moderately stringent conditions typically use 0% to 25% 
formamide in IX to 6X SSC, and use reduced hybridization temperatures, usually in the 
range of about 27 °C to about 40 °C. The wash buffer can have increased ionic strength, e.g. 

30 about 0.6X to about 2X SSC, and is used at reduced temperatures, usually from about 45 °C 
to about 55 °C. With "non-stringent" or "low stringency" hybridization conditions, the 
hybridization buffer is the same as that used for moderately stringent or high stringency, but 
does not contain a denaturing agent. A reduced hybridization temperature is used, typically 
in the range of about 25 °C to about 30°C. The wash buffer has increased ionic strength, 

35 usually around 2X to about 6X SSC, and the wash temperature is in the range of about 
35 °C to about 47 °C. Procedures for nucleic acid hybridizations are well-known in the art 



(Ausubel et ah, Current Protocols in Molecular Biology (1995) Wiley Interscience 
Publishers; Sambrook et al, Molecular Cloning: A Laboratory Manual (1989) Cold Spring 
Harbor Press, New York; Shilo and Weinberg, Proc. Natl. Acad. Sci. U.S.A. (1981) 
78:6789-6792). 

5 

In a specific embodiment of the invention, nucleic acids are provided that are 
capable of hybridizing to any one of SEQ ID NOs:l, 3, and 5, or the above-specified 
fragments thereof, under any one of the hybridization conditions listed in Table 1 . 
Hybridization conditions 8-10, as listed in Table 1, are generally considered "high 
10 stringency" conditions; conditions 4-7 are generally considered "moderately stringent", and 
conditions 1-3 are considered "non-stringent". 



TABLE I 
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25 



Condition 
# 


Hybridization 
Buffer 


Hybridization 
Temp. 


Wash Buffer 


Wash 
Temp. 


1 


6X SSC / 0% formamide 


25°C 


4X SSC 


35°C 


2 


6X SSC / 0% formamide 


25°C 


4X SSC 


40°C 


3 


6X SSC / 0% formamide 


27°C 


4X SSC 


47°C 


4 


6X SSC / 0% formamide 


34°C 


2X SSC 


45°C 


5 


6X SSC / 0% formamide 


40°C 


0.8X SSC 


45°C 


6 


3X SSC / 0% formamide 


40°C 


0.6X SSC 


50° C 


7 


1X SSC / 0% formamide 


40°C 


0.6X SSC 


55°C 


8 


6X SSC / 25% formamide 


42°C 


0.5X SSC 


60°C 


9 


2X SSC / 25% formamide 


42°C 


0.4X SSC 


65°C 


10 


1X SSC / 25% formamide 


42°C 


0.3X SSC 


70°C 



Condition #1 shown in Table 1 is designed to isolate nucleic acids having at least about 
50% sequence identity with the target nucleic acid (with % identity calculated as described 
below). With each subsequent condition, the stringency is such that the isolated nucleic 

30 acid has a sequence identity of at least 5% greater than what would be isolated by using the 
next lower condition number. Thus, for example, condition #2 is designed to isolate nucleic 
acids having at least about 55% sequence identity with the target nucleic acid, and 
conditions #9 and #10 are designed to isolate nucleic acids having at least about 90% and 
95% sequence identity, respectively, to the target nucleic acid. Preferably, each hybridizing 

35 derivative nucleic acid has a length that is at least 30% of the length of the subject nucleic 
acid sequence described herein to which it hybridizes. More preferably, the hybridizing 



nucleic acid has a length that is at least 50%, still more preferably at least 70%, and most 
preferably at least 90% of the length of the subject nucleic acid sequence to which it 
hybridizes. 

As used herein, "percent (%) nucleic acid sequence identity" with respect to a 

5 subject sequence, or a specified portion of a subject sequence, is defined as the percentage 
of nucleotides in the candidate derivative nucleic acid sequence identical with the 
nucleotides in the subject sequence (or specified portion thereof), after aligning the 
sequences and introducing gaps, if necessary to achieve the maximum percent sequence 
identity, as generated by the program WU-BLAST-2.0al9 (Altschul et aL, J. MoL BioL 

10 (1997) 215:403-410; httpr/^last.wstl.edu^last/README.html) with all the search 
parameters set to default values. The HSP S and HSP S2 parameters are dynamic values 
and are established by the program itself depending upon the composition of the particular 
sequence and composition of the particular database against which the sequence of interest 
is being searched. A % nucleic acid sequence identity value is determined by the number of 

15 matching identical nucleotides divided by the sequence length for which the percent identity 
is being reported. Preferably, derivative nucleic acid sequences of the present invention 
have at least 70% preferably at least 80%, more preferably at least 85%, still more 
preferably at least 90%, and most preferably at least 95% sequence identity with any one of 
SEQ ID NOs:l, 3, and 5. In some preferred embodiments, the derivative nucleic acid 

20 encodes a polypeptide comprising an amino acid sequence of any one of SEQ ID NOs:2, 4, 
and 6, or a functionally active fragment thereof. 

A derivative of the subject nucleic acid sequence, or fragment thereof, may 
comprise 100% sequence identity with the subject nucleic acid sequence, but be a derivative 
thereof in the sense that it has one or more modifications at the base or sugar moiety, or 

25 phosphate backbone. Examples of modifications are well known in the art (Bailey J.E. 
Ullmann's Encyclopedia of Industrial Chemistry (1998), 6 th ed. Wiley and Sons). Such 
derivatives may be used to provide modified stability or any other desired property. 

Another type of derivative of the subject nucleic acid sequences includes 
corresponding humanized sequences. A humanized nucleic acid sequence is one in which 

30 one or more codons has been substituted with a codon that is more commonly used in 
human genes. The following list shows, for each amino acid, the calculated codon 
frequency (number in parentheses) in humans genes for 1000 codons (Wada et aL, Nucleic 
Acids Research (1990) 18(Suppl.):2367-2411): 

35 Human codon frequency per 1000 codons: 

ARG : CGA (5.4), CGC (11.3), CGG (10.4) # CGU (4.7), AGA (9.9), AGG (11.1) 
LEU: CUA (6.2), CUC (19.9), CUG (42.5), CUU (10.7), UUA (5.3), UUG (11.0) 
SER: UCA (9.3), UCC (17.7), UCG (4.2), UCIT (13.2), AGC (18.7), AGU (9.4) 
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THR: 


ACA 


(14.4) , 


ACC 


(23.0) , 


ACG 


(6.7) , 


ACU 


(12 


7) 


PRO: 


CCA 


(14.6) , 


CCC 


(20,0) , 


CCG 


(6.6) , 


ecu 


(15 


5) 


ALA: 


GCA 


(14.0) , 


GCC 


(29.1) , 


GCG 


(7.2) , 


GCU 


(19 


6) 


GLY: 


GGA 


(17.1) , 


GGC 


(25.4) , 


GGG 


(17.3) , 


GGU 


(11 


2) 


VAL : 


GUA 


(5.9) , 


GUC 


(16.3) , 


GUG 


(30.9) , 


GUU 


(10 


.4) 


LYS: 


AAA 


(22 .2) , 


AAG 


(34.9) 












ASN: 


AAC 


(22.6) , 


AAU 


(16.6) 












GLN: 


CAA 


(11-1) , 


CAG 


(33.6) 












HIS: 


CAC 


(14.2) , 


CAU 


(9.3) 












GLU: 


GAA 


(26.8) , 


GAG 


(41.4) 












ASP: 


GAC 


(29.0) , 


GAU 


(21.7) 












TYR: 


UAC 


(18.8) , 


UAU 


(12.5) 












CYS: 


UGC 


(14.5) , 


UGU 


(9.9) 












PHE: 


UUU 


{22 . 6) , 


UUC 


(15.8) 












ILE: 


AUA 


(5.8) , 


AUC 


(24.3) , 


AUU 


(14.9) 








MET: 


AUG 


(22 .3) 
















TRP: 


UGG 


(13 .8) 
















TER: 


UAA 


(0.7) , 


AUG 


(0.5) , 


UGA 


(1-2) 









Thus, an SREBP pathway nucleic acid sequence in which the glutamic acid codon, GAA 
has been replaced with the codon GAG, which is more commonly used in human genes, is 
an example of a humanized SREBP pathway nucleic acid sequence. A detailed discussion 
of the humanization of nucleic acid sequences is provided in U.S. Pat. No. 5,874,304 to 
^ Zolotukhin et ah 



Isolation, production, and expression of nucleic acids of the SREBP pathway 

Nucleic acid encoding the amino acid sequence of any one of SEQ ID NOs:2, 4, and 
6, may be obtained from an appropriate cDNA library prepared from any eukaryotic species 
that encodes SREBP pathway proteins such as vertebrates, preferably mammalian (e.g. 
primate, porcine, bovine, feline, equine, and canine species, etc.) and invertebrates, such as 
arthropods, particularly insects species (preferably Drosophila melanogaster) and 
arachnids, and nematodes (preferably C. elegans). An expression library can be constructed 
using known methods. For example, mRNA can be isolated to make cDNA which is 
ligated into a suitable expression vector for expression in a host cell into which it is 
introduced. Various screening assays can then be used to select for the gene or gene 
product (e.g. oligonucleotides of at least about 20 to 80 bases designed to identify the gene 
of interest, or labeled antibodies that specifically bind to the gene product). 

Polymerase chain reaction (PCR) can also be used to isolate nucleic acids of the 
SREBP pathway where oligonucleotide primers representing fragmentary sequences of 
interest amplify RNA or DNA sequences from a source such as a genomic or cDNA library 
(as described by Sambrook et ah, supra). Additionally, degenerate primers for amplifying 
homologues from any species of interest may be used. Once a PCR product of appropriate 
size and sequence is obtained, it may be cloned and sequenced by standard techniques, and 
utilized as a probe to isolate a complete cDNA or genomic clone. 
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Fragmentary sequences of SEQ ID NOs 1, 3 and 6 may be synthesized by known 
methods. For example, oligonucleotides may be synthesized using an automated DNA 
synthesizer available from commercial suppliers {e.g. Biosearch, Novato, CA; Perkin-Elmer 
Applied Biosy stems, Foster City, CA). Antisense RNA sequences can be produced 

5 intracellular^ by transcription from an exogenous sequence, e.g. from vectors that contain 
antisense SREBP pathway nucleic acid sequences. Newly generated sequences may be 
identified and isolated using standard methods. 

An isolated SREBP pathway nucleic acid sequence can be inserted into any 
appropriate cloning vector, for example bacteriophages such as lambda derivatives, or 

10 plasmids such as PBR322, pUC plasmid derivatives and the Bluescript vector (Stratagene, 
San Diego, CA). Recombinant molecules can be introduced into host cells via 
transformation, transfection, infection, electroporation, etc. The transformed cells can be 
cultured to generate large quantities of the SREBP pathway nucleic acid. Suitable methods 
for isolating and producing the subject nucleic acid sequences are well-known in the art 

15 (Sambrook et aL, supra; Glover (ed.), DNA Cloning: A Practical Approach, Vol. 1, 2, 3, 4, 
(1995) MRL Press, Ltd., Oxford, U.K.). 

The nucleotide sequence coding an SREBP pathway protein or a functionally active 
fragment or derivative thereof, can be inserted into any appropriate expression vector for the 
transcription and translation of the inserted protein-coding sequence. Alternatively, the 

20 necessary transcriptional and translational signals can be supplied by the native SREBP 
pathway gene and/or its flanking regions. A variety of host-vector systems may be utilized 
to express the protein-coding sequence such as mammalian cell systems infected with virus 
(e.g. vaccinia virus, adenovirus, etc.); insect cell systems infected with virus (e.g. 
baculovirus); microorganisms such as yeast containing yeast vectors, or bacteria 

25 transformed with bacteriophage, DNA, plasmid DNA, or cosmid DNA. Expression of an 
SREBP pathway protein may be controlled by a suitable promoter/enhancer element. In 
addition, a host cell strain may be selected which modulates the expression of the inserted 
sequences, or modifies and processes the gene product in the specific fashion desired. 

In a specific embodiment, a vector is used that comprises a promoter operably linked 

30 to an SREBP pathway gene nucleic acid, one or more origins of replication, and optionally, 
one or more selectable markers (e.g. thymidine kinase activity, resistance to antibiotics, 
etc.) so that expression of the gene product can be detected. Alternatively, recombinant 
expression vectors can be identified by assaying for the expression of the SREBP pathway 
gene product based on the physical or functional properties of the SREBP pathway protein 

35 in in vitro assay systems (e.g. immunoassays). 
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In specific embodiments, the SREBP pathway protein, fragment, or derivative may 
be expressed as a fusion, or chimeric protein product (i.e. it is joined via a peptide bond to a 
heterologous protein sequence of a different protein). Such a chimeric product can be made 
by ligating the appropriate nucleic acid sequences encoding the desired amino acid 

5 sequences to each other in the proper coding frame (using methods known in the art) and 
expressing the chimeric product. Alternatively, such a chimeric product may be made by 
protein synthetic techniques, e.g. by use of a peptide synthesizer. 

Once a recombinant which expresses the SREBP pathway gene sequence is 
identified, the gene product can be isolated using standard methods (e.g. ion exchange, 

10 affinity, and sizing column chromatography; centrifugation; differential solubility). The 
amino acid sequence of the protein can be deduced from the nucleotide sequence of the 
chimeric gene contained in the recombinant. As a result, the protein can be synthesized by 
standard chemical methods known in the art (Hunkapiller et al. , Nature (1984) 
310:1 05-1 11). Alternatively, native SREBP-pathway proteins can be purified from natural 

15 sources, by standard methods (e.g. immunoaffinity purification). 

SREBP pathway proteins 

The invention provides SREBP pathway proteins that comprise or consist of an 
amino acid sequence of any one of SEQ ID NOs: 2, 4, and 6, or fragments or derivatives 

20 thereof. Compositions comprising these proteins may consist essentially of the SREBP 
pathway proteins. Alternatively, the SREBP pathway proteins may be a component of a 
composition that comprises other components (e.g. a diluent such as saline, a 
pharmaceutical^ acceptable carrier or excipient, a culture medium, carriers used in 
pesticide formulations, etc.). 

25 Typically, a derivative of an SREBP pathway protein will share a certain degree of 

sequence identity or sequence similarity with any one of SEQ ID NOs 2, 4, and 6, or a 
fragment thereof. As used herein, "percent (%) amino acid sequence identity" with respect 
to a subject sequence, or a specified portion of a subject sequence, is defined as the 
percentage of amino acids in the candidate derivative amino acid sequence identical with 

30 the amino acid in the subject sequence (or specified portion thereof), after aligning the 
sequences and introducing gaps, if necessary to achieve the maximum percent sequence 
identity, as generated by the program WU-BLAST-2.0al9 (Altschul et al, supra) using the 
same parameters discussed above for derivative nucleic acid sequences. A % amino acid 
sequence identity value is determined by the number of matching identical amino acids 

35 divided by the sequence length for which the percent identity is being reported. Preferably, 
derivative amino acid sequences of the present invention have at least 80%, preferably at 
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least 85%, more preferably at least 90%, and most preferably at least 95% sequence identity 
with any contiguous stretch of at least 20 amino acids, preferably at least 25 amino acids, 
and more preferably at least 30 amino acids of any one of SEQ ID NOs 2, 4, and 6. 
"Percent (%) amino acid sequence similarity" is determined by doing the same calculation 

5 as for determining % amino acid sequence identity described above, but including 
conservative amino acid substitutions in additional to identical amino acids in the 
computation. A conservative amino acid substitution is one in which an amino acid is 
substituted for another amino acid having similar properties such that the folding or activity 
of the protein is not significantly affected. Aromatic amino acids that can be substituted for 

10 each other are phenylalanine, tryptophan, and tyrosine; interchangeable hydrophobic amino 
acids are leucine, isoleucine and valine; interchangeable polar amino acids are glutamine 
and asparagine; interchangeable basic amino acids arginine, lysine and histidine; 
interchangeable acidic amino acids aspartic acid and glutamic acid; and interchangeable 
small amino acids alanine, serine, threonine, methionine, and glycine. 

15 A preferred derivative of ceSREBP consists of or comprises an amino acid sequence 

that has at least 55%, preferably at least 66%, and more preferably, at least 65% sequence 
identity with amino acid residues 335-428 of SEQ ID NO:2 (i.e. the bHLH-Zip domain). 
Other preferred derivatives of ceSREBP consist of or comprise an amino acid sequence that 
shares at least 75% similarity, preferably at least 80% similarity, and more preferably, at 

20 least 85% similarity with amino acid residues 335-428 of SEQ ID NO:l. Preferably, such 
derivatives share antigenicity with amino acid residues 335-428 of SEQ ID NO:l. 

The invention also provides proteins having amino acid sequences that consist of or 
comprise a fragment of any one of SEQ ID NOs 2, 4, and 6. The fragments usually have at 
least 10, preferably at least 12, and more preferably at least 15 contiguous amino acids of 

25 any one of SEQ ID NOs 2, 4, and 6. A preferred fragment of ceSREBP contains at least 8, 
preferably at least 10, and more preferably at least 12 contiguous amino acids of residues 
335 to 428 of SEQ ID NO:2. 

Preferably the fragment or derivative of the SREBP pathway protein is "functionally 
active" meaning that the SREBP pathway protein derivative or fragment exhibits one or 

30 more functional activities associated with a full-length, wild-type SREBP pathway protein 
comprising the amino acid sequence of any one of SEQ ID NOs:2, 4, and 6. As an 
example, functionally active SREBP pathway protein fragments or derivatives include 
polypeptides that have the antigenicity of the SREBP pathway protein such that they can be 
used in immunoassays, for immunization, for inhibition of SREBP pathway activity, etc, 

35 As another example, a fragment or derivative of SREBP may be considered functionally 
active if it binds a regulatory DNA element of an appropriate target gene such as the SRE-1 
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sequence. S2P may be considered functionally active if it cleaves SREBP at site 2 (as 
depicted in Fig. IB), etc. A fragment or derivative of SCAP may be considered 
functionally active it is capable of binding to the C-terminal regulatory domain of SREBP. 
Fragments or derivatives of SREBP pathway proteins can be tested for functional activity 

5 by various procedures known in the art. In a preferred method which is described in detail 
below, a model organism, such as an insect {e.g. D. melanogaster) or worm (e.g. C. 
elegansX or other model system, is used in genetic studies to assess the phenotypic effect of 
a fragment or derivative (i.e. mutant). As used herein, functionally active fragments also 
include polypeptides that are lacking one or more structural or functional domains of an 

10 SREBP pathway protein. Examples of such domains include transmembrane domains, 
cytosolic domains, lumenal domains, regulatory domains, etc. Thus, for example, an 
SREBP polypeptide lacking the N-terminal acidic region and/or the C-terminal regulatory 
region, is considered a functionally-active fragment. 

The SREBP pathway derivatives of the invention can be produced by various 

1 5 methods known in the art. The manipulations which result in their production can occur at 
the gene or protein level. For example, a cloned SREBP pathway gene sequence can be 
cleaved at appropriate sites with restriction endonuclease(s) (Wells et al. 9 Philos. Trans. R. 
Soc. London SerA (1986) 317:415), followed by further enzymatic modification if desired, 
isolated, and ligated in vitro, and expressed to produce the desired derivative. Alternatively, 

20 an SREBP pathway gene can be mutated in vitro or in vivo, to create and/or destroy 

translation, initiation, and/or termination sequences, or to create variations in coding regions 
and/or to form new restriction endonuclease sites or destroy preexisting ones, to facilitate 
further in vitro modification. A variety of mutagenesis techniques are known in the art such 
as chemical mutagenesis, in vitro site-directed mutagenesis (Carter et a/., Nucl. Acids Res. 

25 (1986) 13:433 1), use of TAB® linkers (available from Pharmacia and Upjohn, Kalamazoo, 
MI), etc. 

At the protein level, manipulations include post translational modification, e.g. 
glycosylation, acetylation, phosphorylation, amidation, derivatization by known 
protecting/blocking groups, proteolytic cleavage, linkage to an antibody molecule or other 

30 cellular ligand, etc. Any of numerous chemical modifications may be carried out by known 
technique (e.g. specific chemical cleavage by cyanogen bromide, trypsin, chymotrypsin, 
papain, V8 protease, NaBH 4 , acetylation, formylation, oxidation, reduction, metabolic 
synthesis in the presence of tunicamycin, etc.). Derivative proteins can also be chemically 
synthesized by use of a peptide synthesizer, for example to introduce nonclassical amino 

35 acids or chemical amino acid analogs as substitutions or additions into the SREBP pathway 
protein sequence. 
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Chimeric or fusion proteins can be made comprising an SREBP pathway protein or 
fragment thereof (preferably consisting of at least a domain or motif of the SREBP pathway 
protein, or at least 6, and preferably at least 10 amino acids of the SREBP pathway protein) 
joined at its amino- or carboxy-terminus via a peptide bond to an amino acid sequence of a 

5 different protein. Such a chimeric protein can be produced by any known method, 

including: recombinant expression of a nucleic acid encoding the protein (comprising an 
SREBP pathway-coding sequence joined in- frame to a coding sequence for a different 
protein); ligating the appropriate nucleic acid sequences encoding the desired amino acid 
sequences to each other in the proper coding frame, and expressing the chimeric product; 

10 and protein synthetic techniques, e.g. by use of a peptide synthesizer. 

Antibodies to SREBP pathway proteins 

SREBP pathway proteins, including functional derivatives and fragments thereof 
{e.g. an SREBP pathway protein encoded by a sequence of any one of SEQ ID NOs:2, 4, 

15 and 6, or a subsequence thereof) may be used as an immunogen to generate monoclonal or 
polyclonal antibodies and antibody fragments or derivatives (e.g. chimeric, single chain, 
Fab fragments). For example, antibodies to a particular domain of an SREBP pathway 
protein may be desired (e.g. an SRE binding domain). In a specific embodiment, fragments 
of an SREBP pathway protein identified as hydrophilic are used as immunogens for 

20 antibody production using art-known methods. Various known methods for antibody 
production can be used including cell culture of hybridomas; production of monoclonal 
antibodies in germ-free animals (PCT7US90/02545); the use of human hybridomas (Cole et 
al, Proc. Natl. Acad. Sci. U.S.A. (1983) 80:2026-2030; Cole et al 9 in Monoclonal 
Antibodies and Cancer Therapy (1985) Alan R. Liss, pp. 77-96), and production of 

25 humanized antibodies (Jones et al, Nature (1986) 321:522-525; US Pat No. 5,530,101). 

Molecules which interact with SREBP pathway proteins 

The present invention provides methods of identifying or screening for molecules, 
such as proteins or other compounds, which interact with SREBP pathway proteins, or 

30 derivatives, or fragments thereof. Assays to find interacting proteins can be performed by 
any method known in the art, for example, immunoprecipitation with an antibody that binds 
to the protein in a complex followed by analysis by size fractionation of the 
immunoprecipitated proteins (e.g. by denaturing or nondenaturing polyacrylamide gel 
electrophoresis), Western analysis, non-denaturing gel electrophoresis, etc. A preferred 

35 method for identifying interacting proteins is a two hybrid assay system or variation thereof 
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(Fields and Song, Nature (1989) 340:245-246; U.S. Patent No. 5,283,173; for review see 
Brent and Finley, Annu. Rev. Genet. (1997) 31:663-704). 

The most commonly used two-hybrid screen system is performed using yeast. All 
systems share three elements: 1) a gene that directs the synthesis of a "bait" protein fused to 

5 a DNA binding domain; 2) one or more "reporter" genes having an upstream binding site 
for the bait, and 3) a gene that directs the synthesis of a "prey" protein fused to an activation 
domain that activates transcription of the reporter gene. For the screening of proteins that 
interact with SREBP pathway proteins, the "bait" is preferably an SREBP pathway protein 
having an amino acid sequence of any one of SEQ ID NOs:2, 4, and 6 (or derivative or 

1 0 fragment thereof), expressed as a fusion protein to a DNA binding domain. Because most 
two-hybrid systems are engineered to enter the nucleus and activate transcription, 
transmembrane portions of proteins can interfere with proper association, folding, and 
nuclear transport of bait or prey segments (Ausubel et aL, supra; Allen et aL, Trends 
Biochem. Sci. (1995) 20:51 1-516). Therefore, the "bait" is preferably an SREBP pathway 

1 5 protein derivative or a fragment that lacks transmembrane domains. The "prey" protein is a 
protein to be tested for ability to interact with the bait, and is expressed as a fusion protein 
to a transcription activation domain. In one embodiment, the prey proteins can be obtained 
from recombinant biological libraries expressing random peptides. 

The bait fusion protein can be constructed using any suitable DNA binding domain. 

20 In a preferred system, the bait contains DNA binding and dimerization domains of the E. 
coli LexA repressor protein. LexA binds tightly to several different operators, and carries a 
dimerization domain at its C terminus. In another preferred system, the bait contains 
residues 1-147 of the yeast GAL4 protein which binds tightly to appropriate DNA binding 
sites, localizes fused proteins to the nucleus, and directs dimerization (Bartel et aL, 

25 BioTechniques (1993) 14:920-924, Chasman et aL, Mol. Cell. Biol. (1989) 9:4746-4749; 
Ma et al, Cell (1987) 48:847-853; Ptashne et aL, Nature (1990) 346:329-331). 

The prey fusion protein can be constructed using any suitable activation domain 
such as GAL4, VP- 16, etc. In various embodiments the preys contain useful moieties such 
as nuclear localization signals (Ylikomi et aL, EMBO J. (1992) 1 1:3681-3694; Dingwall 

30 and Laskey, TIBS (1991) 16:479-481) or epitope tags (Allen et aL, supra) to facilitate 

isolation of the encoded proteins. Activation tagged proteins also differ in whether they are 
expressed constitutively, or conditionally. In a preferred embodiment , the prey is 
conditionally expressed, allowing the transcription phenotypes obtained in selections (or 
"hunts") for interactors to be ascribed to the synthesis of the tagged protein, thus reducing 

35 the number of false positive cells that grow because their reporters are aberrantly 
transcribed. 
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Any reporter gene can be used that has a detectable phenotype. In various specific 
embodiments, some reporter genes allow cells expressing them to be selected by growth on 
appropriate medium (e.g. HIS3, LEU2 described by Chien et aL, Proc. Natl. Acad. Sci. 
U.S.A. (1991) 88:9572-9582; and Gyuris et aL, Cell (1993) 75:791-803). Others allow cells 

5 expressing them to be visually screened such as LacZ and GFP (Chien et al. , supra; and 
http:/www.biol01.com). Reporters also differ in the number and affinity of upstream 
binding sites (e.g. lexA operators) for the bait, and in the position of these sites relative to 
the transcription start point. Finally, reporter genes differ in the number of molecules of the 
reporter gene product necessary to score the phenotype. These differences affect the 

10 strength of the protein interactions the reporters can detect. Thus, for example, one or more 
tandem copies (e.g. four or five copies) of the appropriate DNA binding site can be 
introduced upstream of the TATA box in the desired promoter (e.g. in the area of about 
position -100 to about -400). In a preferred aspect, 4 or 5 tandem copies of the 17 bp UAS 
(GAL4 DNA binding site) are introduced upstream of the TATA box in the desired 

15 promoter, which is upstream of the desired coding sequence for a selectable or detectable 
marker. 

Although the preferred host for two-hybrid screening is the yeast, the host cell in 
which the interaction assay and transcription of the reporter gene occurs can be any cell, 
such as mammalian (e.g. monkey, mouse, rat, human, bovine), chicken, bacterial, or insect 

20 cells. Expression constructs encoding and capable of expressing the binding domain fusion 
proteins, the transcriptional activation domain fusion proteins, and the reporter gene 
product(s) are provided within the host cell, by mating of cells containing the expression 
constructs, or by cell fusion, transformation, electroporation, microinjection, etc. The host 
cell used should not express an endogenous transcription factor that binds to the same DNA 

25 site as that recognized by the DNA binding domain fusion population. Also, preferably, the 
host cell is mutant or otherwise lacking in an endogenous, functional form of the reporter 
gene(s) used in the assay. Various vectors and host strains for expression of the two fusion 
protein populations in yeast can be used (U.S. Patent No. 5,1468,614; Bartel et al, Cellular 
Interactions in Development (1993) Hartley, ed., Practical Approach Series xviii, IRL Press 

30 at Oxford University Press, New York, NY, pp. 153-179; and Fields and Sternglanz, Trends 
In Genetics (1994) 10:286-292. As an example of a mammalian system, interaction of 
activation tagged VP 16 derivatives with a GAL4-derived bait drives expression of reporters 
that direct the synthesis of Hygromycin B phosphotransferase, Chloramphenicol 
acetyltransferase, or CD4 cell surface antigen (Fearon et al 9 Proc. Natl. Acad. Sci. U.S.A. 

35 (1992) 89:7958-7962). In another embodiment, interaction of VP16-tagged derivatives 
with GAL4-derived baits drives the synthesis of SV40 T antigen, which in turn promotes 
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the replication of the prey plasmid, which carries an SV40 origin (Vasavada et al 9 Proc. 
Natl. Acad. Sci. U.S.A. (1991) 88:10686-10690). 

False positives arising from transcriptional activation by the DNA binding domain 
fusion proteins in the absence of a transcriptional activator domain fusion protein can be 

5 prevented or reduced by negative selection for such activation within a host cell containing 
the DNA binding fusion population, prior to exposure to the activation domain fusion 
population. For example, if such cell contains URA3 as a reporter gene, negative selection 
is carried out by incubating the cell in the presence of 5-fluoroorotic acid (5-FOA), which 
kills self-activating DNA-binding domain hybrids. 

10 In a preferred embodiment, the bait SREBP pathway gene and the prey library of 

chimeric genes are combined by mating the two yeast strains on solid media for a period of 
approximately 6-8 hours. Alternatively, the mating can be performed in liquid media. The 
resulting diploids contain both kinds of chimeric genes, i.e., the DNA-binding domain 
fusion and the activation domain fusion. 

1 5 Transcription of the reporter gene can be detected by a linked replication assay, for 

example, as described by Vasavada et al 9 supra, or using immunoassay methods, preferably 
as described in Alam and Cook (Anal. Biochem. (1990)188:245-254). The activation of 
other reporter genes like URA3, HIS3, LYS2 } or LEU2 enables the cells to grow in the 
absence of uracil, histidine, lysine, or leucine, respectively, and hence serves as a selectable 

20 marker. Other types of reporters are monitored by measuring a detectable signal. For 
example, GFP and lacZ have gene products that are fluorescent and chromogenic, 
respectively. 

After interacting proteins have been identified, the DNA sequences encoding the 
proteins can be isolated. In one method, the activation domain sequences or DNA-binding 

25 domain sequences (depending on the prey hybrid used) are amplified, for example, by PCR 
using pairs of oligonucleotide primers specific for the coding region of the DNA binding 
domain or activation domain. Other known amplification methods can be used, such as 
ligase chain reaction, use of Q replicase, or methods described by Kricka et al (Molecular 
Probing, Blotting, and Sequencing (1995) Academic Press, New York, Chapter 1 and Table 

30 IX). 

If a shuttle (yeast to E. coli) vector is used to express the fusion proteins, the DNA 
sequences encoding the proteins can be isolated by transforming the yeast DNA into E. coli 
and recovering the plasmids from E. coli. Alternatively, the yeast vector can be isolated, 
and the insert encoding the fusion protein subcloned into a bacterial expression vector, for 
3 5 growth of the plasmid in E. coli. 
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The two hybrid system can be used for screening candidate molecules that modulate 
interaction between the SREBP pathway protein and the protein with which it interacts. 
Briefly, the protein-protein interaction assay can be carried out by assaying for reporter 
gene activity as described above, except that it is done in the presence of one or more 

5 candidate molecules. An increase or decrease in reporter gene activity relative to that 
present when the one or more candidate molecules are absent indicates that the candidate 
molecule has an effect on the interacting pair. In a preferred method, inhibition of an 
interaction is selected when the inhibition is necessary for cell survival (e.g. an interaction 
that activates the URA3 gene, causing yeast to die in medium containing the chemical 

10 5-fluoroorotic acid (Rothstein, Meth. Enzymol. (1983)101:167-180)). The identification of 
inhibitors of such interactions can also be accomplished using competitive inhibitor assays. 

In vivo and in vitro models of SREBP pathway pene fun ction and dysfunction 

Both genetically modified animal models (i.e. in vivo models), such as C. elegans 
1 5 and Drosophila melanogaster, and in vitro models such as genetically engineered cell lines 
expressing or mis-expressing SREBP pathway genes, are useful for studying lipid 
metabolism and disorders associated with abnormal lipid metabolism. Such models that 
display detectable phenotypes, such as those described in more detail below and in the 
examples, can be used for the identification and characterization of SREBP pathway genes 
20 or other genes of interest and/or phenotypes associated with the mutation or mis-expression 
of an SREBP pathway protein. The term "mis-expression" as used herein encompasses 
mis-expression due to gene mutations. Thus, a mis-expressed SREBP pathway protein may 
be one having an amino acid sequence that differs from wild-type (i.e. it is a derivative of 
the normal protein). A mis-expressed SREBP pathway protein may also be one in which 
25 one or more amino acids have been deleted, and thus is a "fragment" of the normal protein. 
As used herein, "mis-expression" also includes over-expression (e.g. by multiple gene 
copies), underexpression, and non-expression (e.g. by gene knockout or blocking expression 
that would otherwise normally occur). As used in the following discussion concerning in 
vivo and in vitro models, the term "gene of interest" refers to an SREBP pathway gene (i.e. 
30 SREBP, SCAP, SIP, and S2P), or any gene involved in regulation or modulation of the 
SREBP pathway. Such genes may include any gene involved in the biosynthesis or 
metabolism of cholesterol or fatty acids such as HMG coenzyme A synthase, HMG-CoA 
reductase, farnesyl diphosphate synthase, squalene synthase, fatty acid synthase, 
acetyl-CoA carboxylase, glycerol-3 -phosphate acyltransferase, acyl-CoA binding protein, 
35 stearoyl CoA desaturase-1 , lipoprotein lipase, and the LDL receptor. 
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The in vivo and in vitro models may be genetically engineered or modified so that 
they 1) have deletions and/or insertions of one or more SREBP pathway genes, 2) harbor 
interfering RNA sequences derived from SREBP pathway genes, 3) have had one or more 
endogenous SREBP pathway genes mutated, and/or 4) contain transgenes for 

5 mis-expression of wild-type or mutant forms of such genes. Such genetically modified in 
vivo and in vitro models are useful for identification of new genes that are involved in the 
synthesis, activation, control, etc. of SREBP pathway genes and/or gene products. Further, 
other genes of interest that are involved in cholesterol and/or fatty acid biosynthesis or 
metabolism may be identified. The newly identified genes could constitute possible 

1 0 pesticide targets (as judged by animal model phenotypes such as non- viability, block of 
normal development, defective feeding, defective movement, or defective reproduction). 
Alternatively, or additionally, they may constitute possible therapeutic targets, particularly 
in the area of metabolic diseases and disorders, for example, cholesterol synthesis, 
metabolism, and other fatty acid disorders. The model systems can also be used for testing 

15 potential pesticidal or pharmaceutical compounds that interact with the SREBP pathway, 
for example by administering the compound to the model system using any suitable method 
(e.g. direct contact, ingestion, injection, etc.) and observing any changes in phenotype, for 
example, changes in lipid content, lethality, etc. 

A variety of known expression modification methods can be used to genetically 

20 modify the animal models and cell cultures so that they express or mis-express SREBP 
pathway proteins. Some specific examples include radiation mutagenesis such as X-rays, 
gamma rays, and ultraviolet radiation; chemical mutagenesis using, for example, 
ethylmethane sulfonate (EMS), methylmethane sulfonate (MMS), N-ethyl-N-nitrosourea 
(ENU), triethylmelamine, diepoxyalkanes, ICR- 170, or formaldehyde; double-stranded 

25 RNA interference; use of peptide and RNA aptamers; transposon mutagenesis and 
transgene-mediated mis-expression. For some applications, it is useful to use genetic 
modification techniques that result in inheritable expression or mis-expression patterns such 
that the progeny of the genetically-modified animals can be studied. Various genetic 
modification techniques are discussed in more detail below and in the Examples. 

30 

Chemical mutagenesis 

A commonly-used chemical mutagen for creating loss-of-function mutations is ethyl 
methanesulfonate (EMS). In C. elegans, EMS mutagenesis can result in small deletions at a 
rate of approximately 13%. Accordingly, there is about a 95% probability of identifying a 
35 deletion in a gene of interest by screening 4 x 10 6 EMS-mutagenized genomes. Briefly, 
several million nematodes are mutagenized with EMS using the procedure described by 
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Sulston and Hodgkin (The nematode Caenorhabditis elegans (1988) Wood, Ed., Cold 
Spring Harbor Laboratory Press, Cold Spring Harbor, New York, pp. 587-606). The 
mutagenized nematodes are then distributed in small pools in 96-well plates, each pool 
composed of approximately 400 haploid genomes. A portion of each pool is used to 

5 generate a corresponding library of genomic DNA derived from the mutagenized 

nematodes. The DNA library is screened with a PCR assay to identify pools that carry 
genomes with deletions of interest. Mutant worms carrying the desired deletions are 
recovered from the corresponding pools of the mutagenized animals. Although EMS is a 
preferred mutagen to generate deletions, other mutagens can be used that also provide a 

10 significant yield of deletions, such as X-rays, gamma-rays, diepoxybutane, formaldehyde 
and trimethylpsoralen with ultraviolet light. 

Chemical mutagenesis methods, and other methods, for generating loss-of-function 
mutations in£>. melanogaster are described by Ashburner {Drosophila: A Laboratory 
Manual (1989) Cold Spring Harbor, NY, Cold Spring Harbor Laboratory Press). 

15 

RNA-mediated interference with gene expression 

RNA-mediated interference (RNAi), is an effective method for generating loss-of- 
function phenotypes. Loss-of-function phenotypes can be generated by injecting antisense 
RNA that is partially homologous to a gene of interest into embryos using methods 

20 described by Schubiger and Edgar (Methods in Cell Biology (1994) 44:697-713). Another 
antisense RNA methodology involves expression of an antisense RNA partially 
homologous to the gene of interest by operably joining a portion of the gene in the antisense 
orientation to a powerful promoter (such as heat shock gene promoters, or promoters 
controlled by potent exogenous transcription factors, such as GAL4 and tTA) that can drive 

25 the expression of large quantities of the antisense RNA. Antisense RNA-generated loss-of- 
function phenotypes have been reported for several Drosophila genes (LaBonne et al, Dev. 
Biol. (1989) 136(1):1-16; Schuh and Jackie, Genome (1989) 31(l):422-5; and Geisler et al, 
Cell (1992) 71(4):613-21). 

Loss-of-function phenotypes can also be generated by cosuppression methods where 

30 a sense strand RNA corresponding to a partial segment of the gene of interest is injected 
into the animal (Bingham, Cell (1997) 90(3):385-7; Smyth, Curr. Biol. (1997) 7(12):793-5; 
Que and Jorgensen, Dev. Genet. (1998) 22(l):100-9; and Pal-Bhadra et al, Cell (1997) 
90(3):479-90). 

A preferred method for generating loss-of-function phenotypes is by 
35 double-stranded RNA interference (dsRNAi), which has been shown to be very effective in 
both C. elegans (Fire et al, Nature (1998) 391 :806-81 1) and Drosophila (Kennerdall and 
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Carthew, Cell (1998) 95:1017-1026), Briefly, complementary sense and antisense RNAs 
derived from a substantial portion of the gene of interest are synthesized in vitro. Phagemid 
DNA templates containing cDNA clones of the gene are inserted between opposing 
promoters for T3 and T7 phage RNA polymerases. Alternatively, PCR products can be 

5 amplified from coding regions of the gene of interest, where the primers used for the PCR 
reactions are modified by the addition of phage T3 and T7 promoters. The resulting sense 
and antisense RNAs are annealed in an injection buffer. In another embodiment, the 
interfering double-stranded RNA can be generated in vivo by co-expression of the 
complimentary sense and antisense RNAs derived from the gene of interest in the same 

10 cells. Interfering double- stranded RNA is administered to the animals usually by injection 
or by soaking the animals in a solution containing the double-stranded RNA. The animals 
and their progeny are then inspected for phenotypes of interest. 

Peptide and RNA Aptamers 

1 5 Another method for generating loss-of-function phenotypes is the use of peptide 

aptamers, which are peptides or small polypeptides that act as dominant inhibitors of 
protein function. Peptide aptamers specifically bind to target proteins, blocking their 
function (Kolonin and Finley, Proc. Natl. Acad. Sci. (1998) 95:14266-71). Due to the 
highly selective nature of peptide aptamers, they may be used not only to target a specific 

20 protein, but also to target specific functions of a given protein {e.g. a DNA binding 
function). Further, peptide aptamers may be expressed in a controlled fashion by use of 
promoters which regulate expression in a temporal, spatial or inducible manner. Peptide 
aptamers act dominantly; therefore, they can be used to analyze proteins for which 
loss-of-function mutants are not available. 

25 Peptide aptamers that bind with high affinity and specificity to a target protein may 

be isolated by a variety of techniques known in the art. In one method, they are isolated 
from random peptide libraries by yeast two-hybrid screens (Xu et al, Proc. Natl. Acad. Sci. 
(1997) 94:12473-78). They can also be isolated from phage libraries (Hoogenboom et al, 
Immunotechnology (1998) 4:1-20). 

30 RNA aptamers are specific RNA ligands for proteins, that can specifically inhibit 

protein function of the gene (Good et aL, Gene Therapy (1997) 4:45-54; Ellington, et ah, 
Biotechnol. Annu. Rev. (1995) 1:185-214). In vitro selection methods can be used to 
identify RNA aptamers having a selected specificity (Bell et al. 9 J. Biol. Chem. (1998) 
273:14309-14). RNA aptamers can be used to decrease the expression of an SREBP 

35 pathway protein or derivative thereof, or a protein that interacts with any one of SREBP, 
SIP, S2P, andSCAP. 
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Transgenesis 

Methods are well known for incorporating exogenous nucleic acid sequences into 
the genome of animals or cultured cells to create transgenic animals or recombinant cell 
lines. For invertebrate animal models, the most common methods involve the use of 

5 transposable elements. There are several suitable transposable elements that can be used to 
incorporate nucleic acid sequences into the genome of model organisms. Transposable 
elements are particularly useful for inserting sequences into a gene of interest so that the 
encoded protein is not properly expressed, creating a "knock-out" animal having a loss-of- 
ftmction phenotype. Techniques are well-established for the use of P element in Drosophila 

10 (Rubin and Spradling, Science (1982) 218:348-53; U.S. Pat. No. 4,670,388) and Tel in C. 
elegans (Zwaal et al 9 Proc. Natl. Acad. Sci. U.S.A. (1993) 90:7431-7435; and 
Caenorhabditis elegans: Modern Biological Analysis of an Organism (1995) Epstein and 
Shakes, Eds.). Other Tel -like transposable elements can be used such as "minos" (U.S. Pat. 
No. 5,348,874), "mariner" (Robertson, Insect Physiol. (1995) 41:99-105), and "sleeping 

15 beauty"(Ivics et al. 9 Cell (1997) 91(4):501-510). Additionally, several transposable 
elements, that appear to function in a variety of diverse species, have been identified, 
including "piggyBac" (Thibault et aL, Insect Mol Biol (1999) 8(1):1 19-23), "hobo" 
(Atkinson et aL, Proc. Natl. Acad. Sci. U.S.A. (1993) 90:9693-9697), and "hermes" (Sarkar 
et aL, Insect Biochem & Molec. Biol. (1997) 27:359-363). 

20 P elements, or marked P elements, are preferred for the isolation of loss-of- function 

mutations in Drosophila SREBP pathway genes because of the precise molecular mapping 
of these genes, depending on the availability and proximity of preexisting P element 
insertions for use as a localized transposon source (Hamilton and Zinn, Methods in Cell 
Biology (1994) 44:81-94; and Wolfiier and Goldberg, Methods in Cell Biology (1994) 

25 44:33-80). Typically, modified P elements are used which contain one or more elements 
that allow detection of animals containing the P element. Most often, marker genes are 
used that affect the eye color of Drosophila, such as derivatives of the Drosophila white or 
rosy genes (Rubin and Spradling, Science (1982) 218(4570):348-353; and Klemenz et aL, 
Nucleic Acids Res. (1987) 15(10):3947-3959). However, in principle, any gene can be used 

30 as a marker that causes a reliable and easily scored phenotypic change in transgenic 
animals. Various other markers include bacterial plasmid sequences having selectable 
markers such as ampicillin resistance (Steller and Pirrotta, EMBO. J. (1985) 4:167-171); 
and lacZ sequences fused to a weak general promoter to detect the presence of enhancers 
with a developmental expression pattern of interest (Bellen et aL, Genes Dev. (1989) 

35 3(9): 1288-1300). Other examples of marked P elements useful for mutagenesis have been 
reported (Nucleic Acids Research (1998) 26:85-88; and http://flybase.bioindiana.edu). 



-24- 



A preferred method of transposon mutagenesis in Drosophila employs the "local 
hopping" method described by Tower et al (Genetics (1993) 133:347-359). Each new P 
insertion line can be tested molecularly for transposition of the P element into the SREBP 
pathway gene of interest by assays based on PCR. For each reaction, one PCR primer is 

5 used that is homologous to sequences contained within the P element and a second primer is 
homologous to the coding region or flanking regions of the SREBP pathway gene. 
Products of the PCR reactions are detected by agarose gel electrophoresis. The sizes of the 
resulting DNA fragments reveal the site of P element insertion relative to the SREBP 
pathway gene. Alternatively, Southern blotting and restriction mapping using DNA probes 

1 0 derived from genomic DNA or cDNAs of the SREBP pathway gene can be used to detect 
transposition events that rearrange the genomic DNA of the gene. P transposition events 
that map to the SREBP pathway gene can be assessed for phenotypic effects in 
heterozygous or homozygous mutant Drosophila. 

In another embodiment, Drosophila lines carrying P insertions in an SREBP 

1 5 pathway gene, can be used to generate localized deletions using known methods (Kaiser, 
Bioassays (1990) 12(6):297-301; Harnessing the power of Drosophila genetics, In 
Drosophila melanogaster: Practical Uses in Cell and Molecular Biology, Goldstein and 
Fyrberg, Eds., Academic Press, Inc. San Diego, California). This is particularly useful if no 
P element transpositions are found that disrupt a particular SREBP pathway gene of 

20 interest. Briefly, flies containing P elements inserted near an SREBP pathway gene are 
exposed to a further round of transposase to induce excision of the element. Progeny in 
which the transposon has excised are typically identified by loss of the eye color marker 
associated with the transposable element. The resulting progeny will include flies with 
either precise or imprecise excision of the P element, where the imprecise excision events 

25 often result in deletion of genomic DNA neighboring the site of P insertion. Such progeny 
are screened by molecular techniques to identify deletion events that remove genomic 
sequence from the gene of interest, and assessed for phenotypic effects in heterozygous and 
homozygous mutant Drosophila, 

In C. elegans, Tel transposable element can be used for directed mutagenesis of a 

30 gene of interest. Typically, a Tel library is prepared by the methods of Zwaal et al, supra 
and Plasterk, supra, using a strain in which the Tel transposable element is highly mobile 
and present in a high copy number. The library is screened for Tel insertions in the region 
of interest using PCR with one set of primers specific for Tel sequence and one set of 
gene-specific primers. As described in detail in Example 4 below, using such procedures, 

35 C. elegans strains have been isolated that contain Tel transposon insertions within the 
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SREBP gene. The screen for Tel deletions is performed and deletion animals are 
recovered. 

In addition to creating loss-of-function phenotypes, transposable elements can be 
used to incorporate an SREBP pathway gene, or mutant or derivative thereof, as an 

5 additional gene into any region of an animal 5 s genome resulting in mis-expression 

(including over-expression) of the gene. Alternatively, homologous recombination or gene 
targeting techniques can be used to substitute the gene for one or both copies of the animal's 
homologous gene. The transgene can be under the regulation of either an exogenous or an 
endogenous promoter element, and be inserted as either a minigene or a large genomic 

10 fragment. In one application, gene function can be analyzed by ectopic expression, using, 
for example, Drosophila (Brand et al , Methods in Cell Biology (1994) 44:635- 654) or C. 
elegans (Mello and Fire, Methods in Cell Biology (1995) 48:451-482). 

Typically, transgenic animals are created that contain gene fusions of the coding 
regions of the SREBP pathway gene (from either genomic DNA or cDNA) operably joined 

15 to a specific promoter and transcriptional enhancer whose regulation has been well 

characterized, preferably heterologous promoters/enhancers (i.e. promoters/enhancers that 
are non-native to the SREBP pathway genes being expressed). Heat shock 
promoters/enhancers, useful for temperature induced mis-expression in Drosophila include 
the hsp70 and hsp83 genes, and in C. elegans, include hsp 16-2 and hsp 16-41. Tissue 

20 specific promoters/enhancers are also useful, and in Drosophila, include sevenless (Bowtell 
et al, Genes Dev. (1988) 2(6):620-34), eyeless (Bowtell et al, Proc. Natl. Acad. Sci. U.S.A. 
(1991) 88(15):6853-7), and g7ass-responsive promoters/enhancers (Quiring et al, Science 
(1994) 265:785-9) which are useful for expression in the eye; and enhancers/promoters 
derived from the dpp or vestigal genes which are useful for expression in the wing 

25 (Staehling-Hampton et al, Cell Growth Differ. (1994) 5(6):585-93; Kim et al, Nature 
(1996) 382:133-138). Finally, where it is necessary to restrict the activity of dominant 
active or dominant negative transgenes to regions where the pathway is normally active, it 
may be useful to use endogenous promoters of genes in the pathway, such as the SREBP 
pathway genes. 

30 In C. elegans, examples of useful tissue specific promoters/enhancers include the 

myo-2 gene promoter, useful for pharyngeal muscle-specific expression; the hlh-1 gene 
promoter, useful for body- muscle-specific expression; and the mec-3 gene promoter, useful 
for touch-neuron-specific gene expression. In a preferred embodiment, gene fusions for 
directing the mis-expression of SREBP pathway genes are incorporated into a 

35 transformation vector which is injected into nematodes along with a plasmid containing a 
dominant selectable marker, such as rol-6. Transgenic animals are identified as those 
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exhibiting a roller phenotype, and the transgenic animals are inspected for additional 
phenotypes of interest created by mis-expression of the SREBP pathway gene. 

In Drosophila, binary control systems that employ exogenous DNA regulatory 
elements and exogenous transcriptional activator proteins, are particularly useful for testing 

5 the mis-expression of genes in a wide variety of developmental stage-specific and 

tissue- specific patterns. Two examples of binary exogenous regulatory systems include the 
UAS/GAL4 system from yeast (Hay et al, Proc. Natl. Acad. Sci. U.S.A. (1997) 
94(1 0):5 195-200; Ellis et al, Development (1993) 119(3):855-65), and the "Tet system" 
derived from E. coli (Bello et al 9 Development (1998) 125:2193-2202). The UAS/GAL4 

10 system is a well-established and powerful method of mis-expression in Drosophila which 
employs the UAS G upstream regulatory sequence for control of promoters by the yeast 
GAL4 transcriptional activator protein (Brand and Penimon, Development (1993) 
1 18(2):401-15). In this approach, transgenic Drosophila, termed "target" lines, are 
generated where the gene of interest to be mis-expressed is operably fused to an appropriate 

15 promoter controlled by UAS G . Other transgenic Drosophila strains, termed "driver" lines, 
are generated where the GAL4 coding region is operably fused to promoters/enhancers that 
direct the expression of the GAL4 activator protein in specific tissues, such as the eye, 
wing, nervous system, gut, or musculature. The gene of interest is not expressed in the 
target lines for lack of a transcriptional activator to drive transcription from the promoter 

20 joined to the gene of interest. However, when the UAS-target line is crossed with a GAL4 
driver line, mis-expression of the gene of interest is induced in resulting progeny in a 
specific pattern that is characteristic for that GAL4 line. The technical simplicity of this 
approach makes it possible to sample the effects of directed mis-expression of the gene of 
interest in a wide variety of tissues by generating one transgenic target line with the gene of 

25 interest, and crossing that target line with a panel of pre-existing driver lines. 

In the "Tet" binary control system, transgenic Drosophila driver lines are generated 
where the coding region for a tetracycline-controlled transcriptional activator (tTA) is 
operably fused to promoters/enhancers that direct the expression of tTA in a tissue-specific 
and/or developmental stage-specific manner. The driver lines are crossed with transgenic 

30 Drosophila target lines where the coding region for the gene of interest to be mis-expressed 
is operably fused to a promoter that possesses a tTA-responsive regulatory element. When 
the resulting progeny are supplied with food supplemented with a sufficient amount of 
tetracycline, expression of the gene of interest is blocked. Expression of the gene of interest 
can be induced at will simply by removal of tetracycline from the food. Also, the level of 

35 expression of the gene of interest can be adjusted by varying the level of tetracycline in the 
food. Thus, the use of the Tet system as a binary control mechanism for mis-expression has 
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the advantage of providing a means to control the amplitude and timing of mis-expression 
of the gene of interest, in addition to spatial control Consequently, if a gene of interest 
(e.g. a tumor suppressor gene) has lethal or deleterious effects when mis-expressed at an 
early stage in development, such as the embryonic or larval stages, the function of the gene 
5 of interest in the adult can still be assessed by adding tetracycline to the food during early 
stages of development and removing tetracycline later so as to induce mis-expression only 
at the adult stage. 

Dominant negative mutations, where a mutation to a gene creates an inactive 
protein, can result in loss-of-function or reduced- function phenotype even in the presence of 

10 a normal copy of the gene, can be made using known methods (Hershkowitz, Nature (1987) 
329:219-222). In the case of active monomeric proteins, over expression of an inactive 
form, achieved for example, by linking the mutant gene to a highly active promoter, can 
cause competition for natural substrates or ligands sufficient to significantly reduce net 
activity of the normal protein. Alternatively, changes to active site residues can be made to 

1 5 create a virtually irreversible association with a target. 

In the case of active multimeric proteins, several strategies can guide selection of a 
dominant negative mutant. In one embodiment, activity of a multmeric complex can be 
decreased by expression of genes coding exogenous protein fragments that bind to the 
association domains of the wild type proteins and prevent multimer formation. 

20 Alternatively, over-expression of an inactive protein unit can sequester wild-type active 
units in inactive multimers, and thereby decrease multimeric activity (Nocka et al. 9 EMBO 
J. (1990) 9:1805-1813). For example, in the case of multimeric DNA binding proteins, the 
DNA binding domain can be deleted, or the activation domain deleted. Also, in this case, 
the DNA binding domain unit can be expressed without the activation domain causing 

25 sequestering of the target DNA. Thereby, DNA binding sites are tied up without any 
possible activation of expression. In the case where a particular type of unit normally 
undergoes a conformational change during activity, expression of a rigid unit can also 
inactivate resultant complexes. It is also possible to replace an activation domain with a 
transcriptional repression domain and thus change a transcriptional activator into a 

30 transcriptional repressor. Transcriptional repression domains from the engrailed and 

Kruppel proteins have been used for such a purpose (Jaynes and O'Ferrell, EMBO J. (1991) 
10:1427-1433; Licht. et al 9 Proc. Natl. Acad. Sci. USA (1993) 90:11361-65). 

Expression Analysis of SREBP pathway genes 
35 Various expression analysis techniques may be used to identify genes which are 

differentially expressed between a cell line or an animal expressing a wild type SREBP 
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pathway gene compared to another cell line or animal expressing a mutant SREBP pathway 
gene. Such expression profiling techniques include differential display, serial analysis of 
gene expression (SAGE), nucleic acid array technology, subtractive hybridization, and 
proteome analysis (e.g. mass-spectrometry and two-dimensional protein gels). Nucleic acid 

5 array technology may be used to determine a global (i.e., genome-wide) gene expression 
pattern in a normal animal for comparison with an animal having a mutation in one or more 
SREBP pathway gene. Gene expression profiling can also be used to identify other genes 
(or proteins) that may have a functional relation to (e.g. may participate in a signaling 
pathway with) or be a transcriptional target of an SREBP pathway gene. The genes are 

10 identified by detecting changes in their expression levels following mutation, i.e., insertion, 
deletion or substitution in, or over-expression, under-expression, mis-expression or 
knock-out, of an SREBP pathway gene. 

Phenotypes associated with SREBP pathway gene mutations 

15 After isolation of model animals carrying mutated or mis-expressed SREBP 

pathway genes or inhibitory RNAs, animals are carefully examined for phenotypes of 
interest. For analysis of SREBP pathway genes that have been mutated (i.e. deletions, 
insertions, and/or point mutations) animal models that are both homozygous and 
heterozygous for the altered SREBP pathway gene are analyzed. Examples of specific 

20 phenotypes that may be investigated include lethality; sterility; and changes in various 
characteristics of the animal such as motility, body shape, body size and weight, 
metabolism, lipid accumulation, feeding, development, morphogenesis of organs, brood 
size, thermotaxis, etc. Some phenotypes more specific to flies include alterations in: 
morphogenesis of the peripheral sensory organs, imaginal discs, eye development, wing 

25 development, leg development, bristle development, antennae development, gut 

development, fat body, and musculature. Some phenotypes more specific to nematodes 
include: alterations in chemotaxis, a dauer constitutive phenotype, a dauer defective 
phenotype, and a pale-intestine phenotype. A phenotype of particular interest in C. elegans 
is the pale intestine phenotype, which is indicative of defects in lipid metabolism and is 

30 discussed in more detail below and in the Examples. 

Genomic sequences containing an SREBP pathway gene can be used to confirm 
whether an existing mutant Drosophila or C. elegans line corresponds to a mutation in one 
or more SREBP pathway genes, by rescuing the mutant phenotype. Briefly, a genomic 
fragment containing the SREBP pathway gene of interest and potential flanking regulatory 

35 regions can be subcloned into any appropriate Drosophila or C. elegans transformation 
vector, and injected into the animals. For Drosophila, an appropriate helper plasmid is used 
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in the injections to supply transposase. Resulting transformants are crossed for 
complementation testing to an existing panel of Drosophila or C. elegans lines whose 
mutations have been mapped to the vicinity of the gene of interest (Fly Pushing: The 
Theory and Practice of Drosophila Genetics, (1997) Cold Spring Harbor Press, Plainview, 

5 NY; and Caenorhabditis elegans: Modern Biological Analysis of an Organism, supra. If a 
mutant line is discovered to be rescued by this genomic fragment, as judged by 
complementation of the mutant phenotype, then the mutant line likely harbors a mutation in 
the SREBP pathway gene. This prediction can be further confirmed by sequencing the 
SREBP pathway gene from the mutant line to identify the lesion in the SREBP pathway 

10 gene. 

Identification of genes that modify SREBP pathway genes 

The characterization of new phenotypes created by mutations in SREBP pathway 
genes enables one to test for genetic interactions between SREBP pathway genes and other 

15 genes that may participate in the same, related, or interacting genetic or biochemical 

pathway(s). Individual genes can be used as starting points in large-scale genetic modifier 
screens as described in more detail below. Alternatively, RNAi methods can be used to 
simulate loss-of-function mutations in the genes being analyzed. It is of particular interest 
to investigate whether there are any interactions of SREBP pathway genes with other 

20 well-characterized genes, particularly genes involved in lipid metabolism. For example, a 
candidate gene that may be tested for interaction with the SREBP pathway is the insulin 
receptor gene (referred to as inr in Drosophila, and daf-2 in C. elegans). 

Genetic Modifier Screens 

25 A genetic modifier screen using invertebrate model organisms is a particularly 

preferred method for identifying genes that interact with SREBP pathway genes, because 
large numbers of animals can be systematically screened making it more likely that 
interacting genes will be identified. In C. elegans and Drosophila, a screen of up to about 
10,000 animals is considered to be a pilot-scale screen. Moderate-scale screens usually 

30 employ about 10,000 to about 50,000 flies or up to about 100,000 worms, and large-scale 
screens employ greater than about 50,000 or 100,000 flies or worms, respectively. In a 
genetic modifier screen, animals having a mutant phenotype due to a mutation in one or 
more SREBP pathway genes are further mutagenized, for example by chemical mutagenesis 
or transposon mutagenesis. The mutagenesis procedures used in typical genetic modifier 

35 screens of C. elegans are well known in the art. One method involves exposure of 

hermaphrodites that carry mutations in one or more SREBP pathway genes to a mutagen, 
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such as EMS or trimethylpsoralen with ultraviolet radiation (Huang and Sternberg, Methods 
in Cell Biology (1995) 48:97-122). Alternatively, transposable elements are used, 
oftentimes by the introduction of a mutator locus, such as mut-2, which promotes mobility 
of transposons (Anderson, Methods in Cell Biology (1995) 4:31-58). 
5 In Drosophila, the mutagenesis methods and other procedures used in a genetic 

modifier screen depend upon the precise nature of the mutant allele being modified; these 
methods are discussed in more detail below under the Drosophila genetic modifier screen 
subheading. 

Progeny of the mutagenized animals are generated and screened for the rare 
10 individuals that display suppressed or enhanced versions of the original mutant SREBP 
pathway phenotype. Such animals are presumed to have mutations in other genes, called 
"modifier" genes, that participate in the same phenotype-generating pathway. The newly- 
identified modifier genes can be isolated away from the mutations in the SREBP pathway 
genes by genetic crosses, so that the intrinsic phenotypes caused by the modifier mutations 
1 5 can be assessed in isolation. 

Modifier genes can be mapped using a combination of genetic and molecular 
methods known in the art. Modifiers that come from a genetic screen in C. elegans are 
preferably mapped with visible genetic markers and/or with molecular markers such as STS 
markers (The Nematode Caenorhabditis elegans, supra; Caenorhabditis elegans: Modern 
20 Biological Analysis of an Organism, supra). Modifier genes may be uncovered by 

identification of a genomic clone which rescues the mutant phenotype, as described above. 
Alternatively, modifier genes that are identified by a Tel -based screen can be uncovered 
using transposon display technology (Korswagen et aL, Proc Natl Acad Sci U.S.A. (1996) 
93(25): 14680-5). 

25 Standard techniques used for the mapping of modifiers that come from a genetic 

screen in Drosophila include meiotic mapping with visible or molecular genetic markers; 
complementation analysis with deficiencies, duplications, and lethal P-element insertions; 
and cytological analysis of chromosomal aberrations (Fly Pushing: Theory and Practice of 
Drosophila Genetics, supra; Drosophila: A Laboratory Handbook, supra). Genes 

30 corresponding to modifier mutations that fail to complement a lethal P-element may be 
cloned by plasmid resuce of the genomic sequence surrounding that P-element, 
Alternatively, modifier genes may be mapped by phenotype rescue and positional cloning 
(Sambrook et aL, supra). 

Newly identified modifier mutations can be tested directly for interaction with other 

35 genes of interest known to be involved or implicated in the SREBP using methods 

described above. Also, the new modifier mutations can be tested for interactions with genes 



-31 - 



in other pathways that are not believed to be related to SREBP signaling {e.g. Notch in 
Drosophila, and lin in C. elegans). New modifier mutations that exhibit specific genetic 
interactions with other genes implicated in lipid metabolism, but not interactions with genes 
in unrelated pathways, are of particular interest. 

5 The modifier mutations may also be used to identify "complementation groups". 

Two modifier mutations are considered to fall within the same complementation group if 
animals carrying both mutations in trans exhibit essentially the same phenotype as animals 
that are homozygous for each mutation individually and, generally, are lethal when in trans 
to each other (Fly Pushing: The Theory and Practice of Drosophila Genetics, supra). 

10 Generally, individual complementation groups defined in this way correspond to individual 
genes. 

When SREBP pathway modifier genes are identified, homologous genes in other 
species can be isolated using procedures based on cross-hybridization with modifier gene 
DNA probes, PCR-based strategies with primer sequences derived from the modifier genes, 

1 5 and/or computer searches of sequence databases. For therapeutic applications related to the 
function of SREBP pathway, human and rodent homologues of the modifier genes are of 
particular interest. For pesticide and other agricultural applications, homologues of 
modifier genes in insects and arachnids are of particular interest. Insects, arachnids, and 
other organisms of interest include, among others, Isopoda; Diplopoda; Chilopoda; 

20 Symphyla; Thysanura; Collembola; Orthoptera, such as Blattella germanica; Dermaptera; 
Isoptera; Anoplura; Mallophaga; Thysanoptera; Heteroptera; Homoptera, including Bemisia 
tabaci, and Myzus spp.; Lepidoptera including Plodia interpunctella, Pectinophora 
gossypiella, Plutella spp., Heliothis spp., and Spodoptera species; Coleoptera such as 
Leptinotarsa decemlineata, Diabrotica spp., Anthonomus spp., and Tribolium spp.; 

25 Hymenoptera, including Apis mellifera; Diptera, including Anopheles spp.; Siphonaptera, 
including Ctenocephalides felis; Arachnida; and Acarinan, including Amblyoma 
americanum; and nematodes, including Meloidogyne spp., and Heterodera glyciniu 

Genetic modifier screens in Drosophila 

30 The procedures involved in typical Drosophila genetic modifier screens are well- 

known in the art fWolfher and Goldberg, Methods in Cell Biology (1994) 44:33-80; and 
Karim et al 9 Genetics (1996) 143:315-329). The procedures used differ depending upon 
the precise nature of the mutant allele being modified. If the mutant allele is genetically 
recessive, as is commonly the situation for a loss-of-function allele, then most typically 

35 males, or in some cases females, which carry one copy of the mutant allele are exposed to 
an effective mutagen, such as EMS, MMS, ENU, triethylamine, diepoxyalkanes, ICR- 170, 
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formaldehyde, X-rays, gamma rays, or ultraviolet radiation. The mutagenized animals are 
crossed to animals of the opposite sex that also carry the mutant allele to be modified. In 
the case where the mutant allele being modified is genetically dominant, as is commonly the 
situation for ectopically expressed genes, wild type males are mutagenized and crossed to 

5 females carrying the mutant allele to be modified. 

The progeny of the mutagenized and crossed flies that exhibit either enhancement or 
suppression of the original phenotype are immediately crossed to adults containing balancer 
chromosomes and used as founders of a stable genetic line. In addition, progeny of the 
founder adult are retested under the original screening conditions to ensure stability and 

10 reproducibility of the phenotype. Additional secondary screens may be employed, as 

appropriate, to confirm the suitability of each new modifier mutant line for further analysis. 

Although the above-described Drosophila genetic modifier screens are quite 
powerful and sensitive, some genes that participate in the SREBP pathway may be missed 
in this approach, particularly if there is functional redundancy of those genes. This is 

15 because the vast majority of the mutations generated in the standard mutagenesis methods 
will be loss-of- function mutations, whereas gain-of-function mutations that could reveal 
genes with functional redundancy will be relatively rare. Another method of genetic 
screening in Drosophila has been developed that focuses specifically on systematic 
gain-of-function genetic screens (Rorth et al. 9 Development (1998) 125:1049-1057). This 

20 method is based on a modular mis-expression system utilizing components of the 

GAL4/UAS system (described above) where a modified P element, termed an "enhanced P" 
(EP) element, is genetically engineered to contain a GAL4-responsive UAS element and 
promoter. The resulting transposon is used to randomly tag genes by insertional 
mutagenesis (similar to the method of P element mutagenesis described above). Thousands 

25 of transgenic Drosophila strains, termed EP lines, can be generated, each containing a 
specific UAS-tagged gene. This approach takes advantage of the preference of P elements 
to insert at the 5 f -ends of genes. Consequently, many of the genes that are tagged by 
insertion of EP elements become operably fused to a GAL4-regulated promoter, and 
increased expression or mis-expression of the randomly tagged gene can be induced by 

30 crossing in a GAL4 driver gene. 

Systematic gain-of-function genetic screens for modifiers of phenotypes induced by 
mutation or mis-expression of an SREBP pathway gene can be performed by crossing 
several thousand Drosophila EP lines into a genetic background containing a mutant or 
mis-expressed SREBP pathway gene, and further containing an appropriate GAL4 driver 

35 transgene. The progeny of this cross are then analyzed for enhancement or suppression of 
the original mutant phenotype as described above. Those identified as having mutations 
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that interact with the SREBP pathway can be crossed further to verify the reproducibility 
and specificity of this genetic interaction. EP insertions that demonstrate a specific genetic 
interaction with a mutant or mis-expressed SREBP pathway gene, have a physically tagged 
a new gene which can be identified and sequenced using PCR or hybridization screening 
5 methods, allowing the isolation of the genomic DNA adjacent to the position of the EP 
element insertion. 

BODIPY-fattv acid conjugates for determining lipid conte nt of nematodes 

Because defects in the SREBP pathway can result in abnormal metabolism of lipids, 
10 a method for readily identifying mutant model organisms that exhibit abnormalities in lipid 
metabolism would be beneficial. Prior methods for assessing lipid content in nematodes 
includes the use of non-vital stains such as Sudan Black (Kimura et al 9 Science (1997) 
277:942-6). 

However, the drawbacks of these techniques are that the nematodes must be fixed 

1 5 prior to staining. Fixation can introduce artifacts, making an accurate assessment difficult, 
and furthermore, kills the animals making it impossible to carry out further genetic analysis 
on the animals that are tested. In order to avoid these problems associated with fixing 
nematodes, certain vital stains were tried that are routinely used for staining lipid in cultured 
cells such as Nile Red (Greenspan et aL, J Cell Biol, (1985) 100:965- 973). However, it 

20 was found that these dyes tended to result in background fluorescence of gut granules which 
are auto-fluorescent organelles of the intestinal epithelial cells that are thought be to 
lysosomes. In many cases, these fluorescent vital stains appeared to be concentrated in gut 
granules, enhancing their fluorescence and causing difficulty in accurately measuring the 
fluorescence due to lipid droplet staining in the intestine. Accordingly, the invention 

25 provides an improved method for measuring lipid storage in live nematodes. It has been 
found that BODIPY® dyes conjugated to fatty acids (e.g. BODIPY® FL C12 
(4,4-difluoro-5,7-dimethyl- 4-bora-3a,4a-diaza-s-indacene- 3-dodecanoic acid), and 
Cl-BODIPY® 500/510 C12 (4,4-difluoro-5-methyl-4-bora- 3a,4a-diaza-s-indacene- 
3-dodecanoic acid) Molecular Probes, Eugene, OR) concentrate in lipid droplets in the 

30 intestines of living nematodes. These dyes do not have the drawbacks associated with other 
vital dyes because, in addition to clearly staining and fluorescing in lipid droplets in the 
intestine, they quench the background fluorescence due to the gut granules. Accordingly, 
the invention provides a method of using BODIPY®-fatty acid conjugates to stain live 
nematodes for determining the relative and absolute lipid content in response to changes in 
35 metabolic conditions brought on by a) changes in genetic backgrounds including mutations 
in genes essential for control of metabolic processes, b) changes in environmental 
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conditions such as food sources, temperature, and crowding conditions, and c) different 
developmental states including the dauer larva. This method is particularly valuable in uses 
that involve genetic screens and compound screens based on changes in metabolic processes 
such as the SREBP processing pathway, among others. The method allows considerable 
5 increases in accuracy of lipid quantification in vivo over the use of other fluorescent 
lipophilic stains, making automated sorting of the nematodes based on fluorescence 
feasible. 

BODIPY® conjugates have previously been used to study (1) lipid content in the 
surface membrane of Shistosoma mansoni worms (Redman and Kusel, Parasitology (1996) 
10 1 13(2):137- 143), (2) lipid endocytosis in cultured mammalian fibroblasts (Pagano and 
Chen, Ann N Y Acad Sci (1998) 845:152-160), (3) lipid trafficking between the Golgi 
apparatus and plasma membrane of cultured mammalian fibroblasts (Pagano et al. 9 J. Cell. 
Biol (1991) 113(6): 1267-1279), (4) fatty acid transport by Saccharomyces (Faergeman et 
ai, J. Biol. Chem (1997) 272(13):8531-8538) and (5) distribution of ivermectin in muscle 
15 vesicle membranes of Ascaris suum (Marin and Kusel, Parasitology (1992) 

104(3):549-555). However, these prior uses of BODIPY® conjugates do not suggest the 
applicability of BODIPY® conjugates, and in particular, BODIPY® fatty acid conjugates, 
for quantification of lipid storage in nematodes. Moreover, the fact that BODIPY® fatty 
acid conjugates quenches background fluorescence from lysosomes, providing for more 
20 accurate quantification, is an unexpected and important advantage provided by the invention 
that permits large-scale, automated sorting of animals based on fluorescence. 

BODIPY®-fatty acid conjugates can be used to stain nematodes of different genetic 
backgrounds for use in genetic screens, both de novo screens for mutations affecting lipid 
content of whole nematodes and modifier screens for mutations that change lipid 
25 accumulation in mutant nematodes (for example, the insulin receptor (daf-2) or the SREBP 
homolog {pin-1) nematodes). The intestines of the nematodes can be visually examined for 
lipid content under a fluorescent microscope and mutant animals can be subsequently 
propagated for cloning purposes. This method can be used in conjunction with automatic 
flow sorter technology to rapidly separate large numbers of living nematodes by lipid 
30 content. This would be useful either for automated high throughput genetic screening or for 
large scale automated separation of dauer larvae from other developmental stages. 
Additionally, the method can be used to determine changes in lipid accumulation in 
nematodes exposed to inhibitory compounds that might serve as therapeutic agents for the 
control of diabetes, obesity, lipid storage diseases, or other human or animal diseases. A 
35 test compound can be administered to a nematode by direct contact, ingestion, injection, or 
any suitable method and changes in lipid content of the nematode or its progeny are 
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observed. Further, the method is applicable to reverse genetic screening using inhibitory 
RNA. For example, nematodes could be exposed to combinations of large numbers of 
RNAs in 384-well plates and screened for changes in lipid content mediated by RNAi using 
fluorometry or direct visual observation. 

5 

EXAMPLES 

The following examples show how the nucleic acid sequences of SEQ ID NOs 1, 3, 
and 5 were isolated, and how these sequences, and derivatives and fragments thereof, as 
well as other SREBP pathway nucleic acids and gene products can be used for genetic 

10 studies to elucidate mechanisms of the SREBP pathway as well as the discovery of potential 
pharmaceutical or pesticidal agents that interact with the pathway. As used herein, all C 
elegans-denved gene sequences are designated by the letters "ce" in front of the gene 
sequence. Likewise, all Drosophila-dexived gene sequences are designated by the letter "d" 
in front of the gene sequence. 

15 These Examples are provided merely as illustrative of various aspects of the invention and 
should not be construed to limit the invention in any way. 

EXAMPLE 1: CLONING OF C ELEGANS SREBP 

The C. elegans genomic database was searched with the protein sequence of the 

20 human SREBP- 1, SREBP-2, and Drosophila melanogaster SREBP homologue, HLH106, 
using the TBLASTN search tool (Altschul et al 9 supra). One C. elegans open reading 
frame showed significant homology with all three of the above SREBP proteins. This 
homology extends throughout most of the SREBP protein sequences. The C elegans open 
reading frame is located on two overlapping clones on the right arm of chromosome III 

25 (Y 47D3 and H10N23). At the time of the search, there were no previous annotations, gene 
predictions, or candidate mutants that mapped to this region that would suggest previous 
identification of this open reading frame as an SREBP -related gene. 

Using BLAST analyses (Altschul et aL, supra) and the GENSCAN Genefinder 
program (Burge and Karlin, J. Mol. Biol. (1997) 268(l):78-94) , a predicted exon-intron 

30 structure for the C. elegans SREBP-related gene (ceSREBP) was generated. This C. 
elegans homologue of SREBP cDNA was cloned in order to validate its existence as an 
expressed mRNA, and to determine the cDNA and protein sequence for the elucidation of 
ceSREBP function. Moreover, cloning of ceSREBP is a prerequisite for future genetic 
manipulations that require knowledge of the sequence, such as RNAi experiments, 

35 generation of misexpression constructs, isolation of Tel insertion or chemical deletion 
mutants, etc. 
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Cloning strategy: 

The N-terminal and C-terminal ends of ceSREBP were cloned using gene-specific 
internal primers and non-specific primers at 3 ! and 5 1 ends. Internal primers were made to 

5 regions of high homology according to the GENS CAN prediction for ceSREBP, that were 
also predicted by the ACEDB Genefinder (Richard Durbin and Jean Thierry Mieg 
(1991 -present), A C. elegans Database. Documentation, code and data available from 
anonymous FTP servers at lirmm, lirmm.fr, cele.mrc-lmb.cam.ac.uk and ncbi.nlm.nih.gov). 
They were designed to amplify the ends of the cDNA, not the full-length cDNA. Once the 

10 end sequence was known, the full-length cDNA was amplified in overlapping N-terminal 
and C-terminal parts using gene-specific primer pairs. The template for amplification was a 
mixed-stage, 1st strand cDNA pool that was synthesized from poly-A+ RNA using the NotI 
primer/adapter (Life Technologies, Gaithersburg, MI). 



15 N-terminal: 

Antisense, internal primer for amplification of the N-terminal was made to sequence 
encoding the ceSREBP bHLHz region: GTACGACGCTCGGTTTTTGGTC (SEQ ID NO:l). 
Sense primer no. 1 was the 5' splice leader (SL1) sequence: 

GGTTTAATTACCCAAGTTTGAG (SEQ ID NO:l). Amplifications were performed using 
20 Expand™ buffers and enzyme mixes (Roche, Summerville, NJ.). Amplification conditions 
were as follows: 

2 \il C. elegans (ce) mixed-stage 1st strand cDNA 
5 fxl 2 mM dNTPs 

5 Expand™ High Fidelity 10X buffer with MgCl 2 

3 ill SL1, 5 \iM 

25 3 |nl ceSREBP, 5 \iM 

0.75 \il Expand™ High Fidelity enzyme mix 
31.25 |il H 2 0 

94° C 2min 

10 cycles of: 94°C 1 sec 
52°C 30 sec 
68°C 4 min 
30 25 cycles of: 94°C 15 sec 
52° C 30 sec 

68°C 4 min + 20 sec/cycle 

72°C 5min 
4° C hold 

1 jil Amplitaq™ (Perkin-Elmer, Foster City, CA) added with an additional incubation for 
10minat72°C 

35 

PCR products were run on a 1% agarose gel. The major product was a band of 
about lkb in size. 
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C-terminal: 

Sense primer no. 2, was an internal primer made to a region of homology between 
human SREBP1 and Genscan-predicted ceSREBP, encoding LCAVNLAE: 
5 CCTCTGTGCAGTAAACCTTGCTG (SEQ ID NO: 1). The non-specific antisense primer, pNl, 
was made to the 5 ! end of the NotI primer/adapter: GACTACTTCTAGATGGCGAGC (SEQ ID 
NO: 9). The same amplification conditions as for the N-terminal were used using these 
primers. The product of this amplification was 1 .3kb in size. 

1 0 Cloning and sequencing: 

PCR fragments were gel-purified by beta-agarase treatment (NEB, Beverly, MA) 
cloned into the pCR2, 1 vector (Invitrogen, San Diego, CA) essentially according to 
manufacturer's protocols for ligation and transformation into E. coli. Individual colonies 
were screened for an insert of the correct size by PCR using Ml 3 forward and reverse 

15 primers to pCR2.1. Individual colonies from each transformation were grown up overnight 
in 3 ml LB-Amp, and DNA was prepared using Easy-Pure Preps™ (super mini) (Primm 
Labs, Cambridge, MA). DNA preps were digested using EcoRI enzyme to check for clones 
with correct insert size. Clones were end-sequenced with Big Dye™ dye-terminator 
sequencing kit (ABI, Foster City, CA) using Ml 3 forward and reverse primers to pCR2,l. 

20 Sequencing conditions were as follows: 

1 \x\ miniprep DNA (~100 ng) 

1 ul Big Dye™ 

1 ^il primer, 0.8 jiM 

0.5 jliI 5X buffer (400 mM Tris, pH 9; 10 mM MgCl 2 ) 
1.5 nlHaO. 

25 96°Cfor5min 

25 cycles of: 96°C for 30 sec 
50°Cfor 15 sec 
60°C for 4 min 

Reactions were ethanol precipitated and sequenced. Sequencing products were 
30 analyzed using Sequencher program (Gene Codes Corporation, Ann Arbor, MI) and 
BLAST (Altschul et ah, supra). A single N-term contiguous sequence ("contig") was 
assembled that shared sequence identity with the YAC sequence Y47D3 (GL3646936) from 
which gene predictions were made, and was virtually identical to the gene prediction in this 
region. Two non-overlapping C-terminal contigs were assembled, one of which contained 
35 the sequence of sense primer no.2; the other, pNl . Both shared identity with Y47D3 in 
BLAST searches to C. elegans genomic sequence and showed homology to SREBP 
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sequences from other species. A likely termination codon and possible poly-A signals were 
identified. 



Amplification of full-length cDNA: 

5 A longer C-terminal fragment was amplified which overlapped with the N-terminal 

fragment and contained the remainder of the unknown sequence. The ~2.9 kb fragment was 
amplified using the same conditions as for the N- and C- terminals but with the primers 
described below. Another difference is that 3 jil ce mixed stage 1st strand cDNA was used 
rather than 2 pi.. The primers used were a sense primer within the -1 kb N-terminal 

10 fragment, and referred to as Y47-4, and a primer referred to as ceSREBP7, which includes 
the predicted termination codon: 

Y4 7 - 4 : AGCAATGGAACATATCAACGGG (SEQ ID NO: 1) 

ceSREBP7: CAATTCAAAGATCCATAGAAGTATG (SEQ ID NO: 1) 

The -2.9 kb fragment was cloned into the pCR2. 1 vector as described above, 

15 

Sequencing the full-length cDNA: 

To obtain complete sequence of the full-length gene of ceSREBP, the following set 
of seven sequencing primers (ceSREBPsl- s7; contained within SEQ ID NO:l) were 
synthesized based on sequences derived from the original N-terminal and C-terminal 

20 

contigs, and to the most highly conserved regions of the predicted gene. 



sense 

ceSREBPsl 
ceSREBPs2 
ceSREBPs3 
25 ceSREBPs4 

antisense 
ceSREBPsS 
ceSREBPs6 
ceSREBPs7 



30 



GCATGTTCAGCGACGAATGG 
GCAACACTACGACGGGCTAT 
TGGATTGCTCGCTGGAAGTG 
GGAACTTGTCGGTGGTGACG 



CCCTTGAAGCTTTGTGTCCA 
CGTGGAAGTCCGTCGTTTGA 
GGTCACCATGGATCAGCAGT 



35 



These primers, as well as Ml 3 forward and reverse, were used to sequence clones 
containing the -1.3 and the -2.9 kb C-terminal fragments using the above-described 
sequencing methods. When aligned with previously obtained sequence, these yielded a 
single open reading frame (ORF) of -3.4 kb in size which. BLAST analysis against 
GenBank sequences, showed highest homology to other SREBPs, 
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Error-free cDNA clones 

All the C-terminal clones that contributed to the above contig contained PCR errors 
(i.e., single nucleotide discrepancies from the other clones). To obtain error-free sequence 
the ~3 kb C-term fragment was re-amplified using the high-fidelity Pfu enzyme (Stratagene) 
5 and, as template, a commercial mixed stage cDNA library (Stratagene), in addition to the 
1st strand cDNA pool. 

Reaction 1 

2 jllI library cDNA 

5 nil OX Pfu buffer 

10 jil2mMdNTPs 
1A 10^l1Y47-4,5iiM 
10 10 jil ceSREBP7, 5 jiM 

1 nlPfu 

12 |il H 2 0 

Reaction 2 was the same as the first with the exception that 0.5 \i\ Amplitaq™ was used, 
and 0.5 |il Pfu was used instead of 1 jxl. 
1 5 For both reactions : 
94°C 2 min 

35 cycles of: 94°C 15 sec 
55°C 30 sec 
72°C 6 min 

72°C 10 min 
4°C hold 

20 

1 nl Amplitaq™ was added with an additional incubation for 10 min at 72°C. 

The fragments from both reactions were cloned into the pCRII vector as described 
above for the pCR2.1 vector. One clone from each of the above reactions was sequenced 
through using the following primers: 
25 M13 forward and reverse; ceSREBPsl, s2, s3, s4, s6, s7 (above); and the following 
additional primers ceSREBPs9-sl5 (contained within SEQ ID NO:l): 
sense 

ceSREBPs9: CGTTGGATCGATCGCTTCCA 
ceSREBPsl 0: CCGCCGAAGATTTTGACAGA 
ceSREBPsl 1 : TGGGACAAGGGGAGATTGTT 

30 

antisense 

ceSREBPsl 2: GAACGTGCGTCCACCATGTG 
ceSREBPsl3: GCTCCAACCTTTTCGCATCT 
ceSREBPsl 4: GGAGATGATTCGACGGGTGA 
ceSREBPslS: TCCCCGGAATCACTATCCTC 

^ These sets of reactions gave full sequence in both directions and gave identical 

sequence from the two clones. 
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Two clones were also sequenced that contained the ~1 kb N-terminal fragment with 
the following additional primers (contained within SEQ ID NO:l): 
sense 

ceSREBPs8: GCAGCGTCGCTTTTTGTTAA 

5 antisense 

ceSREBPsl6: TGATGGTGGTGATGAGGTGG 
ceSREBPsl7: AATTGTTGGGTGGCGGCTAG 

These reactions, together with sequence from Ml 3 forward and reverse primers, 
gave a full sequence in both directions that was nearly identical to the posted, unfinished 
sequence from Y47D3. The cDNA sequence of the ceSREBP gene, SEQ ID NO: 1 is shown 
in Figure 2. The cDNA is 3419 nucleotides long. This full-length clone contained a single 
open reading frame with an apparent translational initiation site at nucleotide position 24 
and a stop signal at nucleotide position 3365. The predicted polypeptide precursor is 1 1 13 
amino acids long. Additional features include an acidic domain at about nucleotides 24 to 

15 233 (amino acid residues 1 to 69); a possible second acidic domain at about nucleotides 987 
to 1040 (amino acid residues 321 to 338); a basic Helix-loop-helix domain at about 
nucleotides 1089 to 1286 (amino acid residues 355 to 421); a first transmembrane domain at 
about nucleotides 1455 to 1514 (amino acid residues 477 to 497); and a second 
transmembrane domain at about nucleotides 1653 to 1706 (amino acid residues 543 to 561). 

20 A BLAST analysis against the Y47D3 clone which has a total of 35 1,956 

nucleotides, revealed 12 regions of Y47D3 which share sequence identity with SEQ ID 
NO:l, as shown in Table II. 



TABLE II 



25 



35 



Base # of SEQ ID NO:1 


Base#ofY47D3 


% Sequence Identity 


1-80 


179,410-179,331 


100 


81-213 


178,918-178,786 


100 


214-523 


178,528-178,218 


100 


527-632 


177,448-177,338 


96 


633-1052 


177,286-176,864 


100 


1053-1288 


176,520-176,285 


100 


1289-1482 


175,768-175,568 


100 


1483-2011 


175,523-174,994 


100 


2012-2408 


174,687-174,288 


100 


2409-2636 


174,228-174,001 


100 
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2637-2790 


173,954-173,801 


100 


2791-3151 


155,054-154,694 


100 


3152-3397 


154,638-154,393 


100 



An alignment of the predicted protein sequence (SEQ ID NO:2) against the human 
m& Drosophila SREBP proteins was performed. Amino acid residues 353 to 423 of SEQ 
ID NO:2 share 45% and sequence identity and 77% sequence similarity with amino acid 
residues 281-351 of Drosophila melanogaster SREBP (SEQ ID NO:8; Theopold et aL, 
supra; GI079656). Amino acid residues 466 to 826 of SEQ ID NO:2 share 28% sequence 
identity and 47% sequence similarity with human SREBP2 (Gil 082805). The presence of 
other gene and protein sequences bearing significant homology to ceSREBP was 
investigated using the BLAST family of computer programs (Altschul et al 9 supra). The 
amino acid sequence of a sterol regulatory element-binding protein- 1 (SREBP- 1) from Mus 
musculus (GI4240012) was most similar, sharing 52% sequence identity and 71% sequence 
similarity with amino acid residues 335-428 of SEQ ID NO:2 and having up to 5 
contiguous identical amino acids in common with SEQ ID NO:2. Sequence similarity, to a 
lesser extent, was revealed between SEQ ID NO:2 and sequences from U.S. Pat. No. 
5,780,262 (GI3998144), and others. 

The presence of other gene and protein sequences bearing significant homology to 
ceSREBP was investigated using the BLAST family of computer programs against public 
databases. The following amino acid sequences were the most similar: SREBP- 1, Chinese 
Hamster (GI 1083186); SREBP- 1, Cricetulus griseus (GI 516003); Sequence 54 from 
patent US 5527690 (GI 1610915); SREBP2 precursor, human (GI 1082805); SREBP-2, 
Homo sapiens (GI 451330); SREBP2 precursor, Chinese hamster (GI 1083185); Sequence 
38 from patent US 5527690 (GI 1610908); SREBP-1, Homo sapiens (GI 409405); 
SREBP-2, Cricetulus griseus (GI 551506); Transcription factor ADD1, Rat (GI 540006); 
and HLH106, Drosophila Melanogaster (GI 107965). 

Subsequent to the above analysis, a Genefinder prediction of the ceSREBP protein 
was entered into the Genbank database, which is 100% identical to SEQ ED NO:2, and is 
designated GI 3881008. 

EXAMPLE 2: ceSREBP EXPRESSION ANALYSIS 
Strategy 

Expression of ceSREBP was assayed using a transcriptional reporter system in 
which the putative promoter/enhancer region of ceSREBP was fused to GFP. To determine 
how much genomic sequence to include in the reporter construct, the Y47D3 contig 
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containing the N-terminal region of the ceSREBP cDNA and ~25 kb upstream of the 
ceSREBP initiation codon, was analyzed using ACEDB Genefinder and GENSCAN 
programs (Burge and Karlin, supra). There were no known genes within this region, and no 
predicted genes reported by either program. Of the two predicted genes within -8 kb of 
5 ceSREBP, one, -5 kb upstream of ceSREBP showed limited homologies by BLAST 

analysis to C. elegans expressed sequence tags (EST). A genomic fragment of ~4.5 kb was 
chosen as the putative promoter/enhancer region. 



Amplification of genomic enhancer/promoter region 

10 PCR primers were designed to amplify the -4.5 kb genomic fragment, including the 

first few amino acids of ceSREBP. Restriction sites were included in the primers to 
facilitate subcloning into the GFP reporter vector pPDl 17.01 (from the laboratory of Dr. 
Andrew Z. Fire (Fire Lab)), Carnegie Institution of Washington, Baltimore, MD) in an 
in-frame translational fusion to GFP. The sense primer, nucleotides 71,242-71,265 of 

15 Y47D3 (GL3646936), contained an AscI site; the antisense primer, nucleotides 66,719- 

66,747 of Y47D3, contained a Kpnl/Asp718 site: 

ceSREBPpl: ATGGGCGCGCCAACCAAAGTGTGATGCAACAG (SEQ ID NO:28) 
ceSREBPp2: GAGGGTACCTCGTTCATTCTGAAAAAAAAAAGTC (SEQ ID NO:29) 

Amplification was done in duplicate to provide two independently-amplified 

20 

promoter fragments for independent confirmation of the expression pattern; conditions were 

as follows: 

2 (ill N2 genomic DNA (50 ng/jul) 
5 jal 10X Klentaq™ buffer (Clontech, Palo Alto, CA) 
1 \il lOmMdNTPs 
5 \x\ ceSREBPpl, 5 nM 
25 5 (xl ceSREBPp2, 5 \xM 

1 \il Klentaq™ enzyme 
31 ^ilH 2 0 
94°C 2 min 

25 cycles of: 94°C 15 sec 
52°C 1 min 
72°C 5 min 

^ n 72°C10min 
™ 4°C hold 

1 jal Amplitaq™ added with an additional incubation for 10 min at 72°C. 

One half of each reaction was run on a 1% Seaplaque GTG, IX TAE gel, and the 
major product at ~4.5 kb excised. The fragments were purified using Geneclean (Biol 01) 
35 and subcloned into the pCRII vector essentially according to manufacturer's protocols 
(Invitrogen) for ligation and transformation. DNA from individual colonies from each 
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transformation was prepared using Easy-Pure preps (super mini). DNA preps were 
checked using Ascl+Asp718 and AscI+NotI restriction digests. Clones that appeared as 
expected by digest were end-sequenced with Big Dye™ dye-terminator sequencing kit 
using Ml 3 forward and reverse primers to pCRIL 

5 Sequences were analyzed using Sequencher. From each original PCR reaction, a 

single clone that contained the expected insert was identified. 

To subclone the promoter fragment into the GFP reporter vector, this vector, and the 
promoter fragments in pCRII were digested with Ascl+Asp718, gel-purified using 
Geneclean (Biol 01), ligated together and transformed into E. coli using standard 

1 0 procedures. DNA from individual colonies from each transformation was prepared using 
Easy-Pure™ preps (super mini). DNA preps were checked using Ascl+Asp718, Hindi, 
Clal, Dralll, EcoRI, and Hindlll restriction digests. Several clones from each original PCR 
reaction were checked by end-sequencing with primers that sequenced through the two 
cloning junctions. Colonies of clones that looked correct by sequence analysis were 

1 5 re-streaked. DNA from individual colonies was prepared by Qiagen midi preps and 
checked by restriction digest. 

Expression analysis - ceSREBP::GFP 

By GFP expression analysis, ceSREBP is first expressed weakly in embryonic gut 
20 cells at the time of gut cell polarization, which marks the beginning of differentiation. 
There is strong fluorescence by the "bean stage" which persists in all intestinal cells 
throughout embryogenesis and at all larval and adult stages. There is also weak 
fluorescence in the pharynx. Because there is high specificity of expression of ceSREBP in 
intestinal cells, the ceSREBP promoter, contained within nucleotides 66,719-71,265 of 
25 Y47D3 (GL3646936), has utility as a tissue specific promoter that can be operably linked to 
heterologous sequences, such as marker genes and/or genes of interest. Thus, the ceSREBP 
promoter can be used for studying biochemical pathways within the intestine of C. elegans. 

EXAMPLE 3: RNA INTERFERENCE (RNAI) OF C ELEGANS SREBP, S2P and 
30 SCAP 
Methods: 

PCR was carried out on C. elegans sequences for SREBP (SEQ ID NO:l) and S2P 
(Rawson et aL, supra; GI1559384), and a Genbank sequence (GI3875380), that is annotated 
as having HMG-CoA reductase homology, and additionally has been determined to have 
35 homology to the human SCAP protein. Accordingly, GI3875380 is referred to herein as 
ceSCAP. Fragments of between 0.2kb to 2kb were produced in regions of interest. Primers 
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used for each experiment are shown below. Each primer sequence had at either its 3' or 5' 
end (as indicated below) the T7 RNA polymerase binding site, 

ATCGATAATACGACTCACTATAGGG (SEQ ID NO:10) ? which is designated "T7-" below. 

The remaining nucleotides in each primer sequence are from ceSREBP (SEQ ID NO:l), 

5 ceSCAP, or ceS2P, respectively. 

SREBPS'A T7-CCAGCTCAAGGCCCATCAGG 

SREBP3 f A T7-TCACTATCCTCATCATCCTC 

SREBP5 f B T7-GTACCCGGAACCAATCAATA 

SREBP3'B T7-CTGATGAATTTCATGATAGA 

ceSCAP: 

10 SCAPS'A T7-CAGGACACTCCGCCTAACGA (SEQ ID NO:l 1) 

SCAP3'A T7-ACTTACTCGTCAAATTACTC (SEQ ID NO: 12) 

SCAPS'B T7-GTGGCCTCCAGTTGCTCATG (SEQ ID NO: 1 3) 

SCAP3'B T7-CTTGTATTAGAAAAAAAGTG (SEQ ID NO: 14) 

D2013.8S T7-TGCCGCCCATCCAAAAGCCTGC (SEQ ID NO:15) 

D2013.8A T7-TATACTTCGGAACCCCAAGTGG (SEQ ID NO: 16) 

15 ceS2P: 

S2P5 ! T7 - GCTCGGTCATGCGTGGGCGG (SEQ ED NO: 1 7) 

S2P3' T7-TAGCCGCCTCGACAGATTCC (SEQ ID NO: 18) 

S2P5 f B T7-CACCGCACGGAAGCCGACGA (SEQ ID NO: 19) 

S2P3'B T7-CTCATTGAGCTGCCCCACAA (SEQ ID NO:20) 

2Q PCR was carried out with 0.5 nM each primer and 0.4 jig genomic DNA using the 

Expand™ PCR Kit (Roche) at the following conditions: 

94 °C 1 min 15 sec 

35 cycles of: 94°C 15 sec 

57°C 45 sec 

72°C 1 min. 



25 



30 



35 



A small fraction of each reaction (2 to 5 jA) was run on a gel to assure that the PCR worked. 
The rest of each reaction was precipitated and then resuspended in RNase-free water, to 
serve as the template for production of sense and antisense RNAs. Sense and antisense 
RNA were transcribed separately from the DNA template using T7 and T3 RNA 
polymerases (Promega, Madison, WI; RNA production kit, Cat#1300) following the 
manufacturer's protocol. The resulting RNA samples were ethanol-precipitated and 
resuspended in 20 pi of RNAse- free TE (lOmM tris, ImM EDTA), followed by 10 ^1 of 
RNase free 3X IM annealing buffer (20mM KP0 4 pH7.5, 3mM KCitrate pH 7.5, 2% PEG 
6000). The reactions were mixed and incubated at 68°C for 10 minutes and then at 37°C for 
30 minutes to anneal the sense and antisense strands. Alternatively, sense and antisense 
sequences were transcribed together with T7 sites on both strands following the same 
protocols. 
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Injection volumes were selected to deliver 0.5xl0 6 to lxl 0 6 molecules of RNA. 
Injections were delivered to the gonads or the intestinal cavity of C. elegans, and were 
carried out using the methods of Fire et ah (Development (1991) 113:503-514). 

For germline RNAi, adult animals were microinjected with RNA into either the 

5 gonad or intestine using a glass needle mounted on a Medical Systems Corp. (Holliston, 
MA) PLI-90 injector. For RNAi of larvae, wild type LI larvae were isolated by first 
collecting embryos from gravid adults by digestion in 1.25% sodium hypochlorite, 0.25M 
potassium hydroxide, and then allowing the embryos to hatch overnight in M9 buffer. Equal 
volumes of larvae in M9 buffer and RNA were mixed in wells of microtiter plates, 

10 incubated for 24 hours at 1 5 °C, and then transferred to standard nematode growth plates. 

For visualization of lipid, some of the worms were washed off a plate using M9 
buffer (per liter: 30gr Na2HP0 4 , 15g KH 2 P0 4? 2.5g NaCl, 5gr NH 4 C1), collected by 
centrifugation, and resuspended in a 2ng/ml solution of BODEPY™.FL,C12; stock solution 
is lmg/ml in ethanol) prepared in M9 buffer. The worms were placed on a benchtop shaker 

15 overnight at room temperature to absorb the dye. Images were captured using a 
fluorescence microscope (Axioplan™, Zeiss, Thornwood, NY) the next day. 

Results: 

ceSREBP RNAi 

20 Germline ceSREBP (pin-1) RNAi produces several visible phenotypes in the 

progeny of the microinjected animals. The gross phenotype is a fully penetrant larval arrest. 
Arrested larvae appear to be at the L2 stage based on gonad and cuticle morphology, 
although their length is more similar to that of LI stage larvae. Arrested larvae remain 
motile and feeding for several days at 20°C before dying. Their intestine appears paler, or 

25 less darkly pigmented, than wildtype, and this is referred to as the "pale intestine" or "Pin" 
phenotype. 

Morphological defects in ceSREBP RNAi larvae (LI and L2 stages) are confined to 
the intestine, where ceSREBP appears to be primarily expressed, and specifically affect 
three cytoplasmic structures in intestinal cells. First, there is a dramatic reduction in the 

30 number and average size of pigmented droplets in the intestine. This reduction of 

pigmented droplets seems to account for the Pin phenotype observed at low magnification. 
These droplets likely contain lipid since they stain with dye-labeled fatty acid 
(BODIPY™-dodecanoic acid) and their number in various developmental stages and 
mutants correlates with the level of staining with the dye Sudan black in fixed animals. 

35 These observations indicate that ceSREBP is required for formation and/or 

maintenance of lipid droplets in the intestine, the main lipid storage organ of C. elegans. 



-46- 



Second, the gut granules appear larger and more birefhngent than in wildtype. Third, many 
variably sized vesicles appear in the intestine. These vesicles are spherical and transparent; 
similar vesicles are only rarely observed in wildtype larvae. The vesicles in ceSREBP 
RNAi larvae are usually each associated with a gut granule, and they show autofluorescence 

5 similar in color and intensity to that of gut granules. Since gut granules are thought to be 
lysosomal structures, the abnormal vesicles in ceSREBP RNAi larvae may also be 
lysosomal in origin. While many of the intestinal vesicles are immediately visible upon 
microscopic examination, the number and size of vesicles appears to increase over several 
minutes of observation, often as larvae begin to show signs of cellular degradation and 

10 death. Ultraviolet illumination accelerates this process in ceSREBP RNAi larvae and can 
also induce formation of similar vesicles in wildtype, although to a lesser extent. These 
observations may indicate that absorbance of visible or ultraviolet light by the birefringent, 
autofluorescent gut granules causes damage that induces swelling of lysosomes and 
synergizes with the effect of ceSREBP RNAi. The larval arrest and morphological defects 

15 in the intestine described above are also observed in mutant larvae homozygous for the 
pin-1 (ceSREBP) partial deletion allele ep79 (see Example 4), suggesting that germline 
RNAi phenocopies the zygotic null phenotype. 

ceSREBP RNAi of larvae at the LI stage results in apparently normal development 
through the L2 stage, with approximately normal accumulation of intestinal pigmented 

20 droplets. However, most larvae arrest at the L3 or L4 stage and fail to maintain their 

droplets. Arrested larvae, as well as many fully developed adults, show the Pin phenotype 
and have a thinner body than normal. The number and size of pigmented intestinal droplets 
is greatly reduced, as observed in earlier stages for germline RNAi. The finding that the Pin 
phenoptype can be induced by RNAi treatment after terminal differentiation of the intestine 

25 indicates that the phenotype is unlikely to be caused by a developmental defect in the 

intestinal cells. Rather, ceSREBP may be required continuously for proper functioning of 
the intestine. The pale, thin appearance of ceSREBP RNAi larvae and adults is similar to 
that of starved animals; however, the RNAi animals display foraging behavior and pump in 
bacteria through the pharynx into the intestine. These observations suggest that ceSREBP 

30 RNAi larvae are defective in digesting and/or metabolizing food. ceSREBP RNAi larvae 
show greater dispersal away from the food source than wildtype, possibly because they 
cannot derive nutrients from the bacteria. Transparent intestinal vesicles are observed less 
frequently with LI ceSREBP RNAi than with germline ceSREBP RNAi, although most 
larvae and adults accumulate many vesicles within several minutes of microscope 

35 observation under visible or ultraviolet light. Gut granules of the arrested larvae and adults 
are ofter larger and more birefringent that normal. Adults that display the Pin phenotype 
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have fewer embryos than normal in their uterus, suggesting reduced fecundity, and some of 
the embryos show variable developmental defects. Finally, Pin adults often contain large, 
transparent vacuoles in the anterior half of the intestine. These vacuoles are distinct from 
the abnormal vesicles observed in larvae, since the vacuoles are irregularly shaped and not 

5 auto fluorescent, although their origin remains unidentified. ceSREBP RNAi of larvae at the 
L2 stage results in the same defects as LI treatment, but mainly in later stages of 
development. Most animals arrest at the L4 stage or display the adult defects. 

The daf-2 (el 370) temperature-sensitive mutation (described by Gems et al, 
Genetics (1998) 150:129-155) produces an opposite phenotype to that of ceSREBP RNAi, a 

1 0 dark intestine (Din) phenotype associated with increased accumulation of pigmented lipid 
droplets in the intestine. SREBP RNAi can suppress the Din phenotype of daf-2 (el 370), 
suggesting interaction between the pin-1 /SKEBP pathway and cfa/^/insulin-like signaling 
pathway. Specifically, daf-2 (el 370) larvae shifted to non-permissive temperature (25°C) at 
the LI stage constitutively form dauer larvae with dark intestines. ceSREBP RNAi of these 

15 larvae at the LI stage results in Pin dauers with reduced intestinal lipid droplets, daf-2 

(el 370) larvae shifted to non-permissive temperature at the L3 stage, after the critical period 
for commitment to dauer formation, form L4 larvae and adults with dark intestines. If the 
larvae are also treated with pin-1 RNAi at the LI stage, then they can develop a less dark 
intestine at the L4 and adult stages, pin-1 does not appear to be strictly epistatic to daf-2 - 

20 rather, double mutants show an intermediate phenotype. Some pin-1 (ep79) homozygotes 
escape larval arrest and can establish semi-viable strains of Pin animals with small, thin 
bodies and reduced brood size. Double mutants daf-2 (el 370) pin-1 (ep79) at 20°C are 
partially suppressed for all these phenotypes and, in particular, show a less pale intestine. 
These results suggest that pin-1 and daf-2 interact to determine the level of lipid 

25 accumulation in the intestine. 

ceS2P RNAi 

Germline RNAi of the site 2 protease (S2P) homolog results in apparently normal 
development through the adult stage, however adults show a fully penetrant phenotype 

30 exhibiting all the defects observed for pin-1 larval RNAi (except larval arrest). Specifically, 
the adult phenotype includes a small, thin body, pale intestine associated with few lipid 
droplets, abnormally large and birefringent gut granules, large vacuoles in the anterior 
intestine, fewer embryos in the uterus, and variable developmental defects in some of the 
embryos. The gut granule defects seem more pronounced than observed for pin-1 RNAi. 

35 The striking similarity of the RNAi phenotypes for ceS2P and pin-1 strongly suggest that 
these two genes function in a common genetic pathway. The lack of effect of ceS2P RNAi 
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on larval development may indicate functional redundancy with an unidentified gene or 
reduced potency of RNAi for ceS2P compared to pin-L 

ceSCAP RNAi 

5 Germline RNAi of the SCAP homologue generates a phenotype similar to ceS2P 

RNAi in less than 10% of adults. Defective adults display a pale intestine, small and thin 
body, few embryos in the uterus, and slightly more birefringent gut granules. Germline 
RNAi of both ceS2P and ceSCAP together produces a fully penetrant phenotype 
indistinguishable from pin-1 germline RNAi. This phenotype includes L2-L3 larval arrest, 

10 pale intestine associated with few or no intestinal lipid droplets, and abnormally large and 
birefringent gut granules. These results suggest that both the ceS2P and ceSCAP 
homologues function in the pin-1 genetic pathway at all larval and adult stages. If RNAi of 
ceS2P or ceSCAP produces the null phenotype for these genes, then there must exist other 
gene activities that can partially substitute for their functions, presumably in proteolytic 

15 cleavage at site 2 and 1 analogues, repectively, of PIN-1. 

EXAMPLE 4: DOMINANT NEGATIVE ceSREBP PHENOTYPES 

A putative dominant negative form of ceSREBP (ceSREBP.DN) was constructed 
containing amino acids 90-480 of ceSREBP (SEQ ID NO:2). This form lacks the 
20 amino-terminal acidic transcriptional activation domain, as well as the C-terminal 

regulatory part which includes both transmembrane domains. This inactive version which, 
should not be subject to normal ceSREBP processing, is expected to dimerize with the wild 
type protein and thereby decrease overall transcriptional activity. 

ceSREBP.DN was amplified by PCR using sense primer CeSREBPS'DNSacI 
25 (CCCGAGCTCATGCGATTTTCCCCGCCAAACTTTGATC; contained within SEQ ID NO:l), 
and antisense primer CeSREBP3'CAMfeI 

(GGGCAATTGCTAAAGGGTAACTTTCGAAGATCCATCTC; contained within SEQ ID NO:l) . 

The fragment was cloned into vector pPD99.52 (Fire Lab) behind the heat shock promoter 

hsp 16/41, which allows temperature-induced activation of the downstream gene. 
30 ceSREBP.DN was injected into N2 worms using standard protocols for C. elegans 

transformation {Caenorhabditis elegans: Modern Biological Analysis of an Organism, 

supra) at a concentration of 10 (ig/ml plus 100 jug/ml pRF4 rol-6(d) transformation marker, 

and stable lines displaying the roller phenotype were established. 

Misexpression of the ceSREBP.DN transgene was induced by incubating the worms 
35 carrying the transgene at 33-34°C. Worms were grown at 20°C before and after the 

heat-shock. Embryos received a 30 minute heat-shock; larvae and adults received a 2-3 hr 
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heat-shock. Worms were analyzed for several days after the heat shock under the dissecting 
microscope, to assess characteristics such as developmental stage, size, pigmentation, 
mobility (as an indicator of general health), and development of the germ line. Some 
worms were also analyzed using Nomarski optics on the Zeiss Axioplan™ to assess cellular 
5 defects, particularly in the intestine and germ line. 
Results: 

Results of heat-shock experiments are as follows and are characterized in terms of 
the phenotypes of the majority, and minority or variable phenotypes: 

When embryos are heat-shocked ? the majority of animals exhibit slow growth and 
10 become small adults with defective gerrnlines and intestinal defects, including reduced lipid 
content and especially birefringent gut granules. A minority of the animals show 
embryonic arrest or larval lethality, or become adults with pale intestines. 

When LI larvae are heat-shocked, most animals exhibit slow growth and develop 
into small adults with defective gerrnlines, a mottled appearance, and intestinal defects. A 
15 minority of animals have clear vesicles in their intestines. 

When L2 larvae are heat-shocked, the majority of animals develop pale intestines as 
late larvae. A minority of animals exhibit slow growth, become adults with pale intestines, 
and/or small adults with defective gerrnlines. 

When L3 or L4 larve are heat-shocked, the majority become adults with a mottled 
20 appearance and especially birefringent gut granules. A minority of animals exhibit slow 
growth, become adults with defective gerrnlines, or become very pale and sickly adults. 

The majority of heat-shocked adults display no consistent phenotypes, but have 
various instestinal and germline defects. 

The pale intestine phenotype that results from mis-expression of the dominant 
25 negative construct is consistent with the pale intestine phenotype that results from ceSREBP 
RNAi (described in Example 3 above). The construct may be used as a counterscreening 
reagent, in screens for modifiers of ceSREBP. 

EXAMPLE 5: TCI TRANSPOSON MUTAGENESIS 

30 The goal of this set of experiments was to produce loss-of- function mutations in 

genes of interest in order to understand the function of their wild-type counterparts. 
Library preparation 

A Tel transposon insertion library comprising 3 sets of 960 cultures was constructed 
according to published protocols (Zwaal et al. 9 supra, and Plasterk, supra). Very briefly, 
35 5-10 non- synchronized mut-2(MT3 126) animals were cultured on 1 00 mm peptone plates 
(2880 plates total) for 12 days. Each culture was then resuspended in M9 medium and 
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aliquoted into 3 separate tubes, in identical positions, of 3 different racks (each rack holding 
96 tubes). Two of the aliquots were frozen for long-term storage, and one lysed for DNA 
preparation. Lysates were pooled in a 3-dimensional matrix, and their DNA was purified. 
10X to SOX dilutions of each DNA prep were used for library screenings by PCR. 

5 

Library screening 

The library was screened in individual tiers, each library having three tiers, with 
each tier composed of 1,000 lysates or -200,000 haploid genomes. Lysates were pooled 
according to the published protocol A first dimension screen involved PCR on 8 samples 
10 of pooled DNA from ten 96-well plates. A second dimension screen was used to determine 
which of the ten 96-well plates contained the desired mutant (involves screening of 10 DNA 
pools). A third dimension screen was used to determine the "address" of a particular mutant 
{i.e., in which column and row a particular mutant resided - via screening of 12 individual 
lysates from a single row). First dimension reactions were done in quadruplicate; second 
1 5 and third were done in triplicate. 

Two rounds of PCR were performed the first with a pair of gene-specific primers 
and the second with a pair of Tel -specific primers. Two different pairs of Tel primers were 
used: one pair pointing outward from the left of the transposon, and the other pair pointing 
outward from the right (these primer pairs are described in the references cited above). 
20 The first and second round PCR for each dimension was performed in 1 5 jil total 

volume using the following in each reaction: 

IX PCR buffer provided by the manufacturer (Perkin-Elmer) 
1.5 mM MgCl 2 
0.2 mM dNTPs 

0.5 nM each of the Tel and the gene-specific primer 
0.5 units of Taq Polymerase (Perkin-Elmer) 

25 

H 2 0 to 13 ^1 for the first round reactions, and to 15 |Lil for the second round 

First and Second dimension: 2 |il of 1:20 diluted DNA was added; 1:10 DNA 
diluted was added to the third dimension reactions. A small amount of first round reaction 
was transferred to the second round using a pin replicator. PCR cycling conditions were: 
30 94 o C for 3 minutes; t hen 94 °C for 40 seconds, 58°C for 1 minute, 72°C for 2 minutes for 
35 cycles; then 72°C for 2 minutes. 



Insertion Screening 

The primers used for Tel insertion analsysis were as follow: 

35 Tel Primers: _ _ T _ 

Tel LI (round 1, left) CGTGGGTATTCCTTGTTCGAAGCCAGCTAC(SEQ ID 

NO:21) 
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10 



Tel L2(round 2, left) TCAAGTCAAATGGATGCTTGAGA (SEQ ID NO:22) 

Tel Rl (round 1 , right) TCACAAGCTGATCGACTCGATGCCACGTCG(SEQ ID NO:23) 
Tel R2 (round 2, right) GATTTTGTGAACACTGTGGTGAAGT (SEQ ID NO:24) 

C. elegans SREBP gene-specific primers (each contained within SEQ ID NO:l): 

Y47-1 (round 1) CCCACTCTGTCAAAATCTTCGG 

Y47-2 (round 2) TCAGTGAATAGTGTTGCCGTGC 

Y47-4 (round 1) AGCAATGGAACATATCAACGGG 

Y47-3 (round 2) ACGACCAAGGTTTTCTTTTCCC 

Y47-5 (round 1) TCATTGAGGTATGGTGTGGTGG 

Y47-6 (round 2) GACCTCCACCCATTTTTGTGAG 

Y47-8 (round 1) TGTTGTTTGTGCACAGCATGAG 

Y47-7 (round 2) AC GAGC C C T C AGAAC AAAAC AG 



Results: 

Four confirmed Tel insertions were found in the ceSREBP gene: one insertion 
within intron 2 (found using Y47-5/6 and Tel R1/R2; address 1D10); one insertion within 
intron 5 (found using Y47-4/3 and Tel L1/L2; address 5D10); one insertion within intron 7 

15 (found using Y47-1/2 and Tel R1/R2; address 6D2); and one insertion within intron 8 
(found using Y47-1/2 and Tel L1/L2; address 1D2). All addresses are from Tier 1 of the 
Tel library described above. 

Two of the insertion addresses were chosen for further analysis based on their 
relatively central location within the SREBP gene: 5D10, located just upstream of the 

20 predicted basic helix-loop-helix coding region, and 6D2, located downstream of the two 
predicted transmembrane domain coding regions. 



Identification of insertion animals: 

Nematodes from the 6D2 and 5D10 addresses were recovered from frozen stocks 
representing these addresses; these stocks were made from each culture upon preparation of 
the library. In order to identify a nematode carrying either a 6D2 or 5D10 insertion, 
individual surviving nematodes were cloned to individual plates, and after progeny from 
these nematodes were present on the plates, the parent nematodes were picked into 
individual wells of a 96-well plate containing 5jA of nematode lysis buffer (100 mM KC1, 
20 mM Tris-HCl pH 8.3, 5 mM MgCl 2 , 0.9% Nonidet P-40, 0.9% Tween-20, 0.02% gelatin, 
and 400 jig/ml proteinase K). The nematodes were lysed in a PCR machine at 60°C for one 
hour, followed by 95 °C for 15 minutes. 18 jil of a PCR master mix then was added to the 
crude ly sates (to give -20 |xl total reaction volume, assuming evaporation of a portion of the 
lysate); this mix contained: 

IX reaction buffer provided by the manufacturer (Perkin-Elmer) 

1.5mMMgCl 2 
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0.2 mM each dNTP 

0.5 uM each gene-specific primer 

0.5 units Taq polymerase 

to 18 ul per reaction with dH 2 0 

The PCR reactions were cycled using a program identical to that used for screening 
5 the library for the insertions described above. Subsequently, a second round of PCR was 
performed using the same conditions and primers noted above for the insertion screen, after 
transferring a small amount of the first round reaction to the second round master mix using 
a pin replicator. Reactions were run on 1% agarose gels, and gels were analyzed for 
insertion products identical in size to those observed in the original screen for insertions. 
10 Using this PCR-based screen, a population of nematodes was obtained that is 

homozygous for the 6D2 insertion. However, since the location of this Tel insertion was 
confirmed to be within an intron, and Tel elements are often completely removed along 
with the intron during splicing of the pre-mRNA, this insertion population was used to 
identify a deletion in the ceSREBP gene by imprecise excision of the Tel element (as 
1 ^ described above) . 



Identification of a Tcl-mediated deletion 

In order to obtain a Tcl-mediated deletion in the ceSREBP gene, a small library 
consisting of 244 cultures of 6D2 insertion nematodes was generated. To create the library, 

20 _ 5 _! o nematodes homozygous for the 6D2 insertion were seeded onto individual plates. 
After these nematodes had grown, reproduced, and consumed all of the bacteria on these 
plates, triplicate lysates representing these cultures were created by collecting a sample of 
nematodes from each plate by washing with a solution of distilled water, and placing the 
nematodes washed from each plate in one well of a 96-well plate (this was repeated two 

25 additional times to create a triplicate set of lysates). Nematodes were lysed by addition of 
an equal volume of lysis buffer (100 mM KC1, 20 mM Tris-HCl pH 8.3, 5 mM MgCl 2 , 
0.9% Nonidet P-40, 0.9% Tween-20, 0.02% gelatin, and 400 ug/ml proteinase K) followed 
by incubation at -80°C for 15 minutes, 60°C for 3 hours, and 95°C for 15-30 minutes. 

Deletion screening was carried out using a PCR-based approach similar to that used 

30 for insertion screening, both of which have been described previously (Zwaal et al. , supra; 
and Plasterk, supra). Two sets of gene-specific primer pairs were chosen for carrying out a 
nested PCR strategy such that an outside set was used for the first round of PCR and an 
inside set was used for the second round of PCR. The second round of PCR was performed 
to achieve greater specificity in the reaction. The primer sets listed below were chosen 

35 since they are -3.2 kb apart in the ceSREBP genomic sequence (within the typical range for 
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Tel deletion screening), and since they flank either side of the Tel insertion in the 6D2 
population. 

Deletion Screening 

5 ceSREBP gene-specific primers used to identify candidate deletions in ceSREBP 

were: Y47-1 (round 1), Y47-2 (round 2), Y47-13 (round 1), and Y47-14 (round 2). Primers 

used in a "specificity test", i.e. a secondary screen for confirming candidate deletions, were: 

Y47-1 (round 1), Y47-2 (round 2), Y47-4 (round 1), and Y47-3 (round 2). All primers are 

contained within SEQ ID NO:l : 

10 Y47-1 CCCACTCTGTCAAAATCTTCGG 

Y47-2 TCAGTGAATAGTGTTGCCGTGC 

Y47-13 GCTTCTTCGGTTACTAGTTAAC 

Y47- 14 TCAGGAGCATGTTCAGCGACG 

The first round PCR reactions were performed using 2 jil of lysate from two of the 

three sets of lysates, with reactions carried out in a 96-well plate. Each lysate was added to 

1 8 |il of PCR reaction master mix aliquoted into each well: 

IX reaction buffer provided by the manufacturer (Perkin-Elmer) 
1.5mMMgCl 2 
0.2 mM each dNTP 
0.5 jiM each gene-specific primer 
0.5 units Taq polymerase 
20 to 1 8 per reaction with dH 2 0 

The reactions were carried out in duplicate using the following cycling parameters: 
94°C for 3 minutes, then 35 cycles of the following: 94°C for 40 seconds, 55°C for 1 
minute, and 72 °C for 1 minute. The second round of PCR was performed essentially as 
above, except that 19.5 ^1 of the mixture as described for the first round reaction was 

25 aliquoted to each reaction. A small amount of first-round reaction products was transferred 
to the second-round reaction mixtures using a 96-pin replicator. The same temperature 
cycling sequence was used for the second round as described for the first round. 

Products of the second round of PCR were analyzed by electrophoresis in 1% 
agarose gels. A potential deletion product was observed in both of the reactions, and the 

30 putative positive lysate was re-tested by performing duplicate reactions using the relevant 
lysate from all 3 sets of the library (for a total of six reactions) in two rounds of PCR as 
described above. The product was gel purified and sequenced directly to confirm the 
presence of the desired deletion. In addition, in order to confirm that the deletion product 
obtained was specific for the SREBP region (i.e. not an artifactual result of the PCR), an 

35 additional primer set was used in two rounds of PCR as above in a separate set of reactions 
with all three lysates along with one of the two original primer pairs. This primer set was 
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chosen such that the PCR product generated would be -100-300 base-pairs different in size 
from the original deletion product, resulting in a noticeable shift in size from the original 
product when analyzed on 1% agarose gels. This part of the screening procedure is termed 
the "specificity test". Using this procedure to screen the 244 lysates from the 6D2 insertion 
5 library with the primers listed above, one deletion of -2.2 kb within the ceSREBP genomic 
region was identified, and confirmed by the specificity test (primers used for this test are 
included in the table above) and by sequence analysis. This deletion begins within intron 6, 
and ends within exon 9 of the ceSREBP gene. After confirmation, this partial deletion 
allele was named pin-1 (ep79). 

10 

Identification of deletion animals 

Following the identification and confirmation of this -2.2 kb deletion, 192 
individual nematodes from the relevant plate were cloned onto separate, fresh plates. When 
Fl animals were present on the plate, the parent nematodes were placed into buffer present 
15 in 96-well plates and lysed as described above. The same primer pairs and cycling 

conditions used to identify the deletion were used to perform PCR on these animals. Of 
the 192 nematodes screened, one was found by PCR to cany the deletion. 

Analysis of mutant phenotypes 

20 Prior to analysis of the SREBP deletion animals, animals carrying the SREBP 

deletion were outcrossed ten times to a wild-type (N2/Bristol) strain in order to remove 
extraneous, unrelated mutations induced by the high number of Tel elements present in the 
original mutator strain from which the insertion and deletion in the ceSREBP gene were 
isolated. Throughout the outcrossing procedure, the SREBP deletion was followed and 

25 maintained by analyzing progeny of these crosses by PCR, using the same primers and 
conditions used for the deletion screen above. 

Since reduction or elimination of function mutations often recapitulate phenotypes 
observed by RNA mediated interference, which in the case of ceSREBP included larval 
arrest, the deletion mutation was placed in trans to a balancer chromosome, and maintained 

30 as a heterozygous strain. This is based on the assumption that homozygous deletion 
mutants would not be able to be propagated themselves if the mutation results in a larval 
arrest phenotype. 

The outcrossed and balanced strain was analyzed for any mutant phenotypes that 
might arise as a result of the SREBP deletion. It was found that -25% of the progeny 
35 derived from heterozygous SREBP deletion animals (which would correlate to presumptive 
deletion homozygotes) gave rise to phenotypes observed as the result of SREBP 
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RNA-mediated interference described in Example 3 above. These phenotypes include: 
early larval arrest, reduced pigmentation as a result of reduced number of lipid droplets in 
the intestine, and accumulation of fluid- filled vesicles. 



5 EXAMPLE 6: CLONING OF DROSOPHILA S2P 

Using BLAST, two EST clones from the Berkeley Drosophila Genome Project 
(BDGP), LD1 1632 (AA391707) and LD 1 442 1(AA43 9767) were found to have homology 
with hamster S2P (GI274573 1). The sequences were contained in two PI clones D379 and 
D380 (AC005465). Primers were used for primer walking to get the full-length DNA 
10 sequence. Several more sequencing reactions were performed to produce a complete and 
unambiguous coverage of the gene which is referred to herein as Drosophila S2P (dS2P). 
The primer sequences below are contained within SEQ ID NO:3. 
dS2P SEQUENCING PRIMERS: 
(from primer walking) 

507; GGTGAACAAGACAGCTCTTCG 
15 852: AACGGTGGGAATCACTATGTCAG 
1118: TGATGGTCAGCTACAGTGCTG 
1 86: TTTCGTGAAGGTGAAATAGCAG 



(to resolve ambiguities) 

dS2P . s2 : GGTCTTCAGCATAGGATTGG 
dS2P.s3: CACAGTTCGAGTGACATCCC 
dS2P. s4; GTGAGATGGCGCTGCTTTCG 
dS2P.s5 : GCACAAGGGTTGTGATGTAG 
dS2P.s6: TACTCAGCCCGGTGTTCTTG 



Results: 

A full length clone (SEQ ID NO:3) was identified that contained a single open 
reading frame with an apparent translation start site at nucleotide position 219, and a stop 
signal at nucleotide position 1745. The predicted polypeptide precursor is 508 amino acids 
long (SEQ ID NO:4). A search of the PFam and PROSITE databases (Sonnhammer et al, 
Genomics (1997) 46:200-216; Bairoch et al NAR (1991)19 Suppl:224 1-2245; and 
Hofmann et al, NAR (1999) 27:215-219) revealed seven transmembrane domains and a 
PDZ domain* The transmembrane domains are located at approximately amino acid 
residues 4 to 20 (TM1), 82-98 (TM2), 143-159 (TM3), 163-179 (TM4), 208-224 (TM5), 
428-444 (TM6) and 478-494 (TM7). The putative PDZ domain is located at approximately 
amino acid residues 215-285. 

The presence of other gene and protein sequences bearing significant homology to 
Drosophila S2P (Fig.2, SEQ ID NO.4) was investigated using the BLAST family of 
computer programs (Altschul et al, supra). The following amino acid sequences were the 
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most similar: S2P Homo sapiens (GI2745733); S2P Cricetulus griseus (GI2745731); SP2 
metalloprotease, Homo sapiens (GI4164134 and GI4164135); putative protein Arabidopsis 
thaliana (GI2982448); conserved protein Methanobacterium thermoautotrophicum 
(GI2622476); and Orf c04034 Sulfolobus solfataricus (GI1707806). The most homologous 
5 sequence was human S2P (GI2745733) which shared 9 contiguous amino acids at positions 
201-207 of SEQ ID NO:4. Amino acid 127 to 501 of SEQ ID NO:4 shares 32 % sequence 
identity with amino acids 148 to 515 of GI2745733. 

EXAMPLE 7: CLONING OF DROSOPHILA SCAP 

10 The Drosophila SCAP homologue (dSCAP) identified herein, was cloned by PCR 

based on sequence from a gene prediction and from 5' RACE. BLAST analysis of the 
hamster SCAP (Gil 675220) revealed a genomic PI clone, DS06954, with regions of high 
homology. GENSCAN genefinder analysis of this PI predicted a cDNA that included these 
homologous regions and was partially covered by ESTs. dSCAP was cloned in overlapping 

15 N-terminal and C-terminal fragments with a common Hindlll restriction enzyme site. 

N-terminal sequence not represented within the gene prediction was obtained by 
RACE from embryo cDNA prepared with Marathon system (Clontech). A short N-terminal 
fragment was amplified using non-specific primer API 

(C C AT C C T AAT ACGAC TC AC T AT AGGGC ; SEQ ID NO:25) to the Marathon adaptor and 
20 antisense primer dSCAP6 (TCTGGTCCAGCTGCCCGT GTGTTCC; contained within SEQ ID 
NO: 5) contained within the gene prediction and the 5 ! EST. Amplification conditions were 
as follows: 

1 jxl Marathon cDNA 

1 nl lOmMdNTPs 

5 ^1 Klentaq™ buffer 
25 2 ^1AP1 ? 5hM 

2 jxl dSCAP6, 5 ^M 

1 fxl Klentaq™ polymerase 
38jil H 2 0 

94° C 2min 
5 cycles of: 

^ 5 cycles of: 

25 cycles of: 

72°C4min 
12°C hold 

35 

1 JliI Amplitaq™ added with an additional incubation for 10 min at 72°C 
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94°C 15 sec 
70°C 4 min 
94°C15sec 
68°C 4 min 
94°C 15 sec 
62°C 30 sec 
72°C 3 min 



The major PCR product as determined on a 1% agarose gel was an ~0.7 kb band. This 
fragment was cloned into the pCRII shuttle vector (Invitrogen) and completely sequenced 
using Ml 3 forward and reverse primers, and the start codon was identified. Based on the 
N-terminal sequence identified, a longer N-terminal fragment was amplified from Marathon 
5 embryo cDNA using primers dSCAPl 1(TTGGTATACGGATAGAAATTGG; SEQ ID NO:5) 
and dSCAP2 (GCGTTTGGGTATTCGTTGCTCC; SEQ ID NO:5). Conditions were as 
follows: 

2 ul Marathon cDNA 
5 ul 2 mM dNTPs 

5 ul Expand High Fidelity 10X buffer with MgCl 2 
10 3 uldSCAPll,5uM 

3 ul dSCAP2, 5 uM 

0.75 fil Expand High Fidelity enzyme mix 
31.25 ulH 2 0 

94° C 3min 

35 cycles of: 94°C15sec 
n r 50°C 30 sec 

72°C4min 

72°C 4min 
12°C hold 

1 ul Amplitaq™ added with an additional incubation for lOmin at 72°C 

The major PCR product as determined on a 1% agarose gel was an ~1.6 kb band. 
2Q The C-terminal fragment was amplified from embryo 1 st strand cDNA using sense primer 
dSCAP3 (CTCAGTCGCATCCAAAACTGTG; SEQ ID NO:5) and antisense primer dSCAP4 
(TTA GGCGCGCCTATTCCTAGGTGCTAGCGAACC; SEQ ID NO:5) made to the predicted 
cDNA sequence. Amplifications conditions were as follows: 

2 ul 1st strand cDNA 
5 ul 2 mM dNTPs 

25 5 ul Expand High Fidelity 1 OX buffer with MgCl 2 

3 ul dSCAP3, 5 uM 
3 ul dSCAP4, 5 uM 

0.75 ul Expand High Fidelity enzyme mix 
31.25 ulH 2 0 

94°C 3min 
30 15 cycles of: 94 °C 15 sec 

60°C 30 sec 
72° C 2 min 
20 cycles of: 94°C 15 sec 
60°C 30 sec 
72°C 2min + 20sec/cycle 
72°C 5 min 
4° C hold 

1 ul Amplitaq™ added with an additional incubation for lOmin at 72°C 
The major PCR product as determined on a 1% agarose gel was an ~2.2 kb band. 
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Both N-terminal and C-terminal fragments were cloned into pCRII and completely 
sequenced in both directions. 

Results 

5 A full-length clone was identified that contained a single open reading frame with an 

apparent translation^ initiation site at nucleotide position 73 and a stop signal at nucleotide 
position 3786 (SEQ ID NO:5). The predicted polypeptide precursor is 1237 amino acids 
long (SEQ ID NO:6). Additional features include: 

1) A Ribosomal RNA adenine dimethylase at nucleotides 667 to 703 (amino acid 
10 residues 198 to 210); 

2) Four G-beta (GB) repeat WD domains: GB1 at nucleotides 2509 to 2617, 
corresponding to amino acid residues 812 to 848; GB2 at nucleotides 3080 to 3196, 
corresponding to amino acid residues 1005 to 1041; GB3 at nucleotides 3208 to 3325, 
corresponding to amino acid residues 1045 to 1084; and GB4 at nucleotides 3337 to 3445, 

15 corresponding to amino acid residues 1088 to 1 124; 

3) Six predicted transmembrane (TM) domains. TM1 at nucleotides 991 to 1039, 
corresponding to amino acid residues 306 to 322; TM2 at nucleotides 1 1 17 to 1 165 ? 
corresponding to amino acid residues 348 to 364; TM3 at nucleotides 1 180 to 1228, 
corresponding to amino acid residues 369-385; TM4 at nucleotides 1366 to 1414, 

20 corresponding to amino acid residues 431-447; TM5 at nucleotides 1753 to 1801, 

corresponding to amino acid residues 560 to 576; and TM6 at nucleotides 2353 to 2401, 
corresponding to amino acid residues 760 to 776. 

The presence of other gene and protein sequences bearing significant homology to 
dSCAP (SEQ ID NO:5) was investigated using BLAST (Altschul et aL, supra) against 

25 nucleotide databases. This revealed that dSCAP is covered by two genomic clones from 
BDGP: DS06954 (PI D338), and DS05325 (PI D340). The accession number for the two 
clones is AC007121. Other sequences bearing nucleotide homology with dSCAP are 
human mRNA for KIAA0199 gene (GI 1228046), and Cricetulus griseus SCAP mRNA (GI 

1228046) . At the protein level, dSCAP shares homology with the following sequences: G 
30 elegans predicted SCAP D2013.8 (GI 642180), Homo sapiens KIAA0199 gene (GI 

1228047) , Cricetulus griseus SCAP (GI 1675220), and is similar to the transmembrane 
domain of HMGCOA (GI 3875380). 

EXAMPLE 8: TRANSGENIC DROSOPHILA MISEXPRESSING SREBP 

35 The wild-type Drosophila SREBP (dSREBP) (HLH106) gene was cloned by PCR. 

The coding sequence of the gene was amplified in overlapping N-terminal and C-terminal 
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regions from a Drosophila adult cDNA library (Stratagene, cat #936603). Primers used to 
amplify the N-terminal region were sense primer HLH1 06. 1 
(AATGGACACGACACTGATGAAC; SEQ ED NO:7) and antisense primer HLH106.2 
(AGCCATGTTGCTTGCGAATAGT; SEQ ID NO:7). Primers used to amplify the C-terminal 

5 regions were sense primer HLH106.3 (AAACAGGCGCTGGCATCTGCAC; SEQ ID NO:7) and 
antisense primer HLH106.4 (GGCGCGCCCACGTTCGTGCCACTTATTATGTA; SEQ ID NO:7). 
The fragments were spliced together using the common restriction site SacIL 
In addition to the wild-type gene, one putative constitutively active form dSREBP 
(dSREBP.CA) and three putative dominant negative forms were engineered for 

^ misexpression in Drosophila. All were designed based on precedents in mammalian 
SREBP research (reviewed by Brown and Goldstein, supra). These constructs, as well as 
the wild-type gene may be used both as screening or counterscreening reagents, and as 
devices to further elucidate the function of SREBP in Drosophila. 

Sequences of all fragments were verified. All constructs were cloned into 

^ pExPress-UAS . pExPress is a vector designed specifically for misexpression of genes in 
transgenic Drosophila, This vector was derived from pGMR (Hay et aL, Development 
(1994) 120:2121-2129). The vector is 9Kb long, and contains: an origin of replication for 
E. coli; an ampicillin resistance gene; P element transposon 3' and 5' ends to mobilize the 
inserted sequences; a White marker gene; an expression unit comprising the TATA region 

20 

of hsp70 enhancer and the 3 'untranslated region of a-tubulin gene. The expression unit 
contains a first multiple cloning site (MCS) designed for insertion of an enhancer and a 
second MCS located 500 bases downstream, designed for the insertion of a gene of interest. 
DNA constructs are cloned into the EcoRl and/or EcoRl/AscI sites of the second MCS. 
Fragments cloned into pExPress-UAS were injected into yw Drosophila embryos 

25 

using standard protocols for Drosophila transformation (Rubin and Spradling, supra). A 
variety of GAL4 driver lines were used to drive mis-expression of the transgenes. Driver 
lines glass, sevenless, Kruppel, Rhomboid, 2677, and 1878 are available from the 
University of Indiana (http://flybase.bio.indiana.edu). Lines T93, Tl 13, and T155 were 
kindly provided by Tian Xu (Yale University School of Medicine, New Haven, CT, USA ) 

30 

Descriptions of the larval expression patterns of the GAL 4 are presented in Table III. 



35 
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TABLE III 



10 



15 



20 



GAL4-Driver 


Larval Expression Pattern 






EYE 




3X glass (GMR) 


Photoreceptor cells, very strong expression 


2X seveniess 


R7 photoreceptor cells 


2677 


Transiently, during eye development 






FAT BODY 




T93 


Fat body, wing and eye discs, brain, salivary glands 


T113 


Fat body, wing and eye discs, salivary glands 


T155 


Fat body, wing and eye discs, brain, salivary glands 






GUT/GENERAL 




1878 


Ubiquitous (fat body, gut, discs, trachea, brain, etc.) 


Kruppel 


General gut, fat body, brain and segmental neurons, 
salivary glands 


Rhomboid 


Whole gut, segmental nerves, salivary glands, minor fat 
body and salivary gland staining 



The putative activated form, dSREBP.CA, contains amino acids 1-448 of dSREBP 
(SEQ ID NO: 8) and lacks the C-terminal regulatory region, including the 
membrane-spanning domains, and thus should require no processing to activate 
transcriptional targets. dSREBP.CA was amplified by PCR from a clone of wild-type 
dSREBP using sense primer HLH106.1 (AATGG AC ACGACACTGATGAAC ; SEQ ID NO:7) 
and antisense primer HLH106.CA (CTAGCGAGAGTGGGTGGCCATGC; SEQ ID NO:7). The 
observed phenotypes for this construct under various driver lines are presented in Table IV. 
The phenotypes exhibited by expression in the fat body is evidence that the dSREBP 
transgene exerts metabolic effects. 
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TABLE IV 



5 



15 



GAL4-Driver 


Phenotype 


Line #1- No driver 


Bristles (macrochaete) shortened, often missing 


Line #2 - No driver 


No phenotype 


EYE 




3X glass (GMR) 


Line 1: Lethality, embryonic or larval 
Line 2: Strong rough eye 


2X sevenless 


Lines 1 & 2: Lethality, early pupal 




1— lilt; 1. LcU Idllly , tJi I lUI yUl IIU Ul Idl Veil 

Line 2: Rough and reduced eye 


FAT ROnY 






Lines 1 ex z. Keuuceu maie viauiiuy, reuueeu Temaie 
fertility, adults with caved-in abdomens and starved 
appearance, persistence of the larval fat body in adults, 
short life spans (all w/variable penetrance) 


T113 


Line 1: Mostly pupal lethal, most survivors are female 
and have the abdomen phenotype of T93. 


T155 


Lines 1 & 2: Larval lethal; a few escapers appear 
normal. 


GUT/GENERAL 




1878 


Lethal- embryo or larvae 


Kruppel 


Lethal- embryo or larvae 


Rhomboid 


Lethal- embryo or larvae 



Two putative dominant negative forms of dSREBP, "dSREBP.DNr (Dominant 
Negative regulated)" and "dSREBP .DNur (Dominant Negative unregulated)" lack the 
amino-terminal acidic domain and should be transcriptionally inactive. They are expected 
to act by competing with the wild-type protein in dimerization to make transcriptionally 
inactive dimers. They differ by the inclusion of the C-terminal regulatory region. 
dSREBP.DNr includes the full regulatory region and should be active only in conditions in 
which dSREBP is cleaved from the ER membrane; it contains amino acids 75-1 1 13 of 
dSREBP. dSREBP.DNr lacks the regulatory region and should not require processing; it 
may therefore be a more potent inhibitor of transcription. dSREBP.DNur contains amino 
acids 75-448 of dSREBP. 

The 5' part of dSREBP.DNr was amplified by PCR from a clone containing the 5 1 of 
the dSREBP, using sense primer HLH106.DN (CGCAATGTCCGTCGAGCAACAGCCGCAC; 
SEQ ID NO:7) and antisense primer HLH106.2 (AGCCATGTTGCTTGCGAATAGT; SEQ ID 
NO:7). This fragment was spliced together with an overlapping clone containing the 3 ? part 
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of dSREBP using the common restriction site SacII. dSREBPur was similarly amplified 
using sense primer HLH106.DN and antisense primer HLH106.CA. 

The third putative dominant negative "p450/dSREBP" is expected to act through 
interaction with SCAP. p450/dSREBP contains the C-terminal regulatory region of 
5 dSREBP (amino acids 521-1 1 13), fused to the mammalian cytochrome p450 
transmembrane domain, which acts to anchor the protein in the ER. The p450 
transmembrane, with a 5', in-frame ATG, was generated by annealing two complementary 
oligonucleotides. Sense-strand oligonucleotide was p450.1 

(CTGGAATTCAACATGGATCCAGTGGTGGTGCTGGGACTCTGCCTCTCCTGCTTGCTTCTCCTTT 
10 CACTCTGGAAGCAGAGCTATGGAGGAGGAAAGCTT; SEQ ID NO:26). Antisense- strand oligo 
was p450.2 

(AAGCTTTCCTCCTCCATAGCTCTGCTTCCAGAGTGAAAGGAGAAGCAAGCAGGAGAGGCAGAGT 
CCCAGCACCACCACTGGATCCATGTTGAATTCCAGAGCT; SEQ ID NO:27), This transgene 
should act by titrating out dSCAP, leaving less available for the processing of wild-type 
^ dSREBP. Table V summarizes the observed phenotypes for the dominant negative 
constructs under various driver lines. 



TABLE V 



20 



25 



Line 


GAL4-Driver 


Phenotype 




EYE 




Wild Type 


3X glass (GMR) 


Rough eye; disorganization of ommatidia 




2X seven less 


Some very mild roughness in posterior eye 




2677 


Some very mild roughness in posterior eye 




EYE 




dSREBP.DNr 


3X glass (GMR) 


Mild rough eye; disorganization of ommatidia 




2X sevenless 


No phenotype 




2677 


Some very mild roughness in posterior eye 


dSREBP.DNur 


3X glass (GMR) 


Strong rough eye 




2X sevenless 


No phenotype 




2677 


Some very mild roughness in posterior eye 




GUT/GENERAL 






Kruppel 


Lethality, embryonic or larval 


p450/dSREBP 


EYE 






3X glass (GMR) 


Strong rough eye, fused ommatidia 
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The present invention is not to be limited in scope by the specific embodiments 
described herein. Indeed, various modifications of the invention in addition to those 
described herein will become apparent to those skilled in the art from the foregoing 
description and accompanying drawings. Such modifications are intended to fall within the 
5 scope of the appended claims. The disclosure of each reference cited herein, including 
patents and other references, is hereby incorporated herein by reference in its entirety. 
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WHAT IS CLAIMED IS: 



1 . An animal that is a fly or nematode that has been genetically modified to express or 
mis-express an SREBP pathway protein, or the progeny of said animal that has inherited 

5 said SREBP pathway protein expression or mis-expression. 

2. The animal of Claim 1 that has been genetically modified by a method selected from 
the group consisting of transposon insertion mutagenesis, double-stranded RNA 
interference, and chemical mutagenesis. 

10 

3. The animal of Claim 1 wherein a heterologous promoter drives expression or 
mis-expression of said SREBP pathway protein. 

4. The animal of Claim 3 wherein said promoter is selected from the group consisting 
15 of tissue-specific promoters, developmental-specific promoters, and inducible promoters. 

5. The animal of Claim 4 wherein said animal is a fly and said promoter is selected 
from the group consisting of sevenless, eyeless, glass, dpp, heat shock, tTA-responsive, 
GAL4-responsive, and vestigal. 

20 

6. The animal of Claim 1 wherein said SREBP pathway protein is encoded by an 
SREBP pathway nucleic acid sequence linked to a nucleic acid sequence that encodes one 
or more selectable markers that allows detection of expression of said SREBP pathway 
protein. 

25 

7. The animal of Claim 1 wherein said expression or mis-expression of said SREBP 
pathway protein results in an identifiable phenotype. 

8. The animal of Claim 1 wherein said SREBP pathway protein comprises an amino 
30 acid sequence selected from the group consisting of SEQ ID NOs:2, 4, 6, and 8, or a 

functionally-active fragment thereof. 

9. The animal of Claim 8 wherein said SREBP pathway protein is encoded by part or 
all of a nucleic acid sequence selected from the group consisting of SEQ ID NOs:l, 3,5, 

35 and 7. 
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10. The animal of Claim 7 wherein said nematode is heterozygous for deletion of 
SREBP. 

1 1 . The animal of Claim 7 wherein said animal is a nematode and said identifiable 
5 phenotype is a pale intestine phenotype or other intestinal defect. 

12. A method for studying lipid metabolism comprising detecting the phenotype caused 
by the expression or mis-expression of said SREBP pathway protein in the animal of Claim 
1. 

10 

13. The method of Claim 12 additionally comprising observing a second animal having 
the same genetic modification as the animal of Claim 1 which causes said expression or 
mis-expression of said SREBP pathway protein, and wherein said second animal 
additionally comprises a mutation in a gene of interest, wherein differences, if any, between 

15 the phenotype of the animal of Claim 1 and the phenotype of the second animal identifies 
the gene of interest as capable of modifying the function of the gene encoding said SREBP 
pathway protein. 

14. The method of Claim 13 wherein said gene of interest is implicated in cholesterol or 
20 fatty acid biosynthesis or metabolism. 

15. The method of Claim 13 wherein said animal is a nematode and wherein said 
phenotype is a pale intestine phenotype or other intestinal defect indicative of abnormalities 
in lipid biosynthesis or metabolism. 

25 

16. The method of Claim 1 3 wherein said animal is a nematode and wherein said 
method includes staining said nematode in vivo with a fluorescently-labelled fatty acid 
conjugate to measure lipid content within said nematode. 

30 17. The method of Claim 16 wherein said fluorescently-labelled fatty acid conjugate is 
a BODIPY™-fatty acid conjugate. 

18. The method of Claim 13 additionally comprising administering one or more 
compounds to said animal or its progeny and observing any changes in lipid content of said 
35 animal or its progeny. 
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19. A method for determining the lipid content of a living nematode comprising 
contacting said nematode with a BODIPY™ fatty acid conjugate to stain lipid and 
measuring fluorescence as an indication of lipid content. 

5 20. The method of Claim 19 which is used in combination with a genetic screen for 
detection of mutations that affect lipid content. 

21 . The method of Claim 19 that additionally includes administering one or more 
compounds to said nematode or its progeny and observing any effect said compound has on 

10 lipid content. 

22. An isolated nucleic acid molecule of less than 15 kb comprising a nucleic acid 
sequence selected from the group consisting of: 

A) a nucleic acid sequence that encodes a polypeptide comprising at least 10 contiguous 
15 amino acids of the sequence of any one of SEQ ID NO:2> 4 and 6; and 

B) a nucleic acid sequence that encodes a polypeptide comprising at least 8 contiguous 
amino acids of residues 335 to 428 of SEQ ID NO:2. 

23. The isolated nucleic acid molecule of Claim 22 that hybridizes under appropriate 
20 conditions to a nucleic acid sequence selected from the group consisting of SEQ ID NOs:l ? 

3 and 5. 

24. The isolated nucleic acid molecule of Claim 23 wherein said appropriate conditions 
comprise hybridization at 34° C in a buffer comprising 6X SSC / 0% formamide and a wash 

25 at 45 °C in a buffer comprising 2X SSC. 

25. A vector comprising the nucleic acid molecule of Claim 22. 

26. A host cell comprising the vector of Claim 25. 

30 

27. The host cell of Claim 26 wherein said cell is a yeast cell. 

28. A process for producing an SREBP pathway protein comprising culturing the host 
cell of Claim 26 under conditions suitable for expression of said SREBP pathway protein 

3 5 and recovering said protein. 
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29. An isolated SREBP pathway protein produced by the process of Claim 28. 

30. The isolated SREBP pathway protein of Claim 29 which is joined at its amino- or 
carboxy-terminus via a peptide bond to an amino acid sequence of a different protein. 

5 

31. A nucleic acid molecule comprising a C. elegans SREBP promoter operably-linked 
to a heterologous gene, wherein said SREBP promoter is derived from nucleotides 66,719- 
71,265 of Y47D3 (GL3646936). 

10 32. The nucleic acid of Claim 31, wherein said heterologous gene encodes a selectable 
or detectable marker. 

33. A method of detecting a candidate molecule that binds to a polypeptide comprising 
SEQ ID 2, 4, or 6 comprising; 
15 (a) contacting said polypeptide with one or more candidate molecules under 

conditions conducive to binding; and 

(b) detecting any binding that occurs between the candidate molecules and said 
polypeptide. 
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ABSTRACT OF THE INVENTION 

Drosophila melanogaster and C. elegans that have been genetically modified to 
express or mis-express proteins involved in the sterol regulatory element binding protein 
(SREBP) pathway are described. These genetically modified animal models have 
5 identifiable phenotypes that make them useful in assays for studying lipid metabolism, 
other genes implicated in lipid metabolism, and compounds capable of modulating lipid 
metabolism pathways. Methods for studying lipid metabolism in living nematodes using 
fluorescently-labelled fatty acid conjugates, such BODIPY™ fatty acid conjugates, are also 
described. Novel SREBP pathway nucleic acid and protein sequences are also described. 
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GGTTTAATTACCCAAGTTTGAGAATGAACGAAGAATTCGAGGGAGACGTC 5 0 
CCAAATTAATGGGTTCAAACTCTTACTTGCTTCTTAAGCTCCCTCTGCAG 

CCTATGTCGGATCCGTTTCTCTCATTGGTCACAAAATTGGATGATATTGC 100 
GG ATAC AGCCTAGGC AAAGAG AGTAACC AGTGTTTT AAC C TACTATAAC G 

GCCATTTCCAAATAACGACCCGCTCGATTTTGACATGGAGCACAACTGGC 150 
CGGTAAAGGTTTATTGCTGGGCGAGCTAAAACTGTACCTCGTGTTGACCG 

AAGAGCCCGGACCATCACAACAACCGGATCCATCAATTCCCGGAAATCAA 200 
TTCTCGGGCCTGGTAGTGTTGTTGGCCTAGGTAGTTAAGGGCCTTTAGTT 

C AC AGTC C GCC AC AGGAATATTATG AT ATTGATGGTC AAC GAGACGTAAG 2 50 
GTGTCAGGCGGTGTCCTTATAATACTATAACTACCAGTTGCTCTGCATTC 

CACCTTACACTCCCTGCTCAACCACAACAACGACGACTTCTTCTCAATGC 3 0 0 
GTGGAATGTGAGGGACGAGTTGGTGTTGTTGCTGCTGAAGAAGAGTTACG 

GATTTTCCCCGCCAAACTTTGATCTCGGCGGAGGCCGTGGACCTTCTCTA 3 50 
CT AAAAGGGGC GGTTTG AAAC TAG AGC C GC CTCC GGC AC C TGGAAG AGAT 

GCCGCCACCCAACAATTATCTGGAGAAGGTCCTGCAAGTATGCTTAACCC 400 
CGGCGGTGGGTTGTTAATAGACCTCTTCCAGGACGTTCATACGAATTGGG 

C TTAC AAAC ATC TCC AC C AAGTGGAGGTT AC C C C C CGGC AG ATGCC T AC A 450 
GAATGTTTGTAGAGGTGGTTCACCTCCAATGGGGGGCCGTCTACGGATGT 

GACCTCTATCACTTGCTCAACAACTCGCCGCGCCAGCGATGACTCCACAT 500 
CTGG AG ATAGTGAACGAGTTGTTG AGCGGCGCGGTC GC TAC TG AGGTGTA 

CAGGCAGCGTCGCTTTTTGTTAATACTAATGGAATTGATCAAAAGAATTT 550 
GTCCGTCGCAGCGAAAAACAATTATGATTACCTTAACTAGTTTTCTTAAA 

CACTCATGCAATGCTATCTTCACCACACCATACCTCAATGACTTCTCAAC 600 
GTGAGTACGTTACGATAGAAGTGGTGTGGTATGGAGTTACTGAAGAGTTG 

CATATACAGAAGCCATGGGACATATCAACGGGTACATGTCTCCATACGAC 650 
GTATATGTCTTCGGTACCCTGTATAGTTGCCCATGTACAGAGGTATGCTG 

C AAGCTC AAGGC CC ATC AGGACC ATC ATATT AC TC AC AAC AC C ATC AATC 7 00 
GTTCGAGTTCCGGGTAGTCCTGGTAGTATAATGAGTGTTGTGGTAGTTAG 

TCCACCACCTCATCACCACCATCACCACCCGATGCCAAAAATCCATGAGA 750 
AGGTGGTGGAGTAGTGGTGGTAGTGGTGGGCTACGGTTTTTAGGTACTCT 

ACCCTGAACAAGTGGCATCTCCATCGATTGAAGATGCTCCAGAGACGAAA 800 
TGGGACTTGTTCACCGTAGAGGTAGCTAACTTCTACGAGGTCTCTGCTTT 



FIG. 3A 



7326-101 



(SHEET 4 OF 25) 



CCAACTCATTTGGTTGAACC ACAAAGTCCAAAAAGCCCGC AGAATATGAA 850 
GGTTGAGTAAACCAACTTGGTGTTTCAGGTTTTTCGGGCGTCTTATACTT 

AGAGGAGCTTCTTCGGTTACTAGTTAACATGTCTCCGAGTGAAGTTGAAC 9 00 
TCTCCTCGAAGAAGCCAATGATCAATTGTACAGAGGCTCACTTCAACTTG 

GGTTAAAGAATAAAAAATCAGGAGCATGTTCAGCGACGAATGGGCCATCG 9 50 
CCAATTTCTTATTTTTTAGTCCTCGTACAAGTCGCTGCTTACCCGGTAGC 

AGGAGT AAGGAGAAGGC GGCG AAG ATTGTG ATTC AGG AG AC AGC GGAAGG 10 00 
TCCTCATTCCTCTTCCGCCGCTTCTAACACTAAGTCCTCTGTCGCCTTCC 

GGATGAAGATGAGGATGATGAGGATAGTGATTCCGGGGAGACTATGTCTC 1050 
CCTACTTCTACTCCTACTACTCCTATCACTAAGGCCCCTCTGATACAGAG 

AGGGAACTACTATTATTGTTCGAAGACCAAAAACCGAGCGTCGTACGGC A 110 0 
TCCCTTGATGATAATAACAAGCTTCTGGTTTTTGGCTCGCAGCATGCCGT 

C AC AATC TC ATCGAAAAG AAGT ATAGATGC TC AATAAATG ATCGAATTC A 1150 
GTGTT AGAGT AGC TTTTC TTC ATATCT AC G AGTTATTTACTAGCTT AAGT 

AC AGCTGAAAGTACTTTTGTGTGGGGATGAAGCTAAGCTTTCAAAATCGG 12 00 
TGTC G AC TTTC ATGAAAAC AC ACCC C TAC TTC GATTC G AAAGTTTTAGC C 

C AAC AC TACGACGGGC T ATTG AAC ATATCGAGG AGGTTGAAC ACGAGAAT 12 50 
GTTGTGATGCTGCCCGATAACTTGTATAGCTCCTCCAACTTGTGCTCTTA 

C AGGTGTTGAAGCATC ATGTTGAACAAATGAGAAAGACACTGC AGAATAA 1300 
GTCCACAACTTCGTAGTACAACTTGTTTACTCTTTCTGTGACGTCTTATT 

TCGATTAC CGTACCC GGAAC C AATTC AATAC ACTGAAT ACTC TGC C CG AT 13 50 
AGCTAATGGCATGGGCCTTGGTTAAGTTATGTGACTTATGAGACGGGCTA 

CACCCGTCGAATCATCTCCTTCTCCACCTAGAAATGAGAGAAAACGATCA 1400 
GTGGGC AGC TTAGTAGAGG AAG AGGTGGATC TTTACTCTC TTTTGCTAGT 

CGAATGAGC ACAACGACTCCTATGAAGAATGGAACTAGAGATGGATCTTC 1450 
GCTTACTCGTGTTGCTGAGGATACTTCTTACCTTGATCTCTACCTAGAAG 

GAAAGTTAC CC TTTTTGCGATGCTC CTAGC AGTTCTGATTTTTAATCCGA 1500 
CTTTCAATGGGAAAAACGCTACGAGGATCGTCAAGACTAAAAATTAGGCT 

TTGGATTGCTCGCTGGAAGTGCGATATTCTC AAAAGCCGCTGCAGAAGCT 1550 
AAC CTAAC G AGCGAC CTTC ACGCTATAAGAGTTTTCGGCG ACGTC TTC GA 

CCGATTGC CTCC C CGTTCG AGC ATGGAAGAGTGATTGATGAC C CGG ATGG 1600 
GGCTAACGGAGGGGCAAGCTCGTACCTTCTCACTAACTACTGGGCCTACC 
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AACTAGCACTCGGACGCTTTTCTGGGAAGGGAGTATCATC AATATGAGCT 1650 
TTG ATCGTG AGCCTGCG AAAAG ACC C TTC CC TC ATAGTAGTT AT AC TCGA 

ATGTCTGGGTGTTC AAC ATCTTAATGATC ATATATGTGGTTGTC AAACTG 17 00 
T AC AG AC C C AC AAGTTGT AGAATTAC TAGTATATAC AC C AAC AGTTTGAC 

CTGATCCATGGTGACCCTGTTCAAGACTTCATGTCCGTTTC ATGGCAGAC 17 50 
GACTAGGTACCACTGGGACAAGTTCTGAAGTACAGGCAAAGTACCGTCTG 

TTTTGTGACGACTCGAGAGAAGGCGAGAGCCGAGTTGAACTCTGGAAATT 1800 
AAAACACTGCTGAGCTCTCTTCCGCTCTCGGCTCAACTTGAGACCTTTAA 

TGAAAGATGCTCAGAGAAAGTTCTGCGAGTGTCTTGCAACGTTGGATCGA 1850 
ACTTTCTACGAGTCTCTTTCAAGACGCTCACAGAACGTTGCAACCTAGCT 

TCGCTTCCATCACCGGGGGTTGATTCGGTGTTTTCGGTTGGCTGGGAATG 190 0 
AGCGAAGGTAGTGGCCCCCAACTAAGCCACAAAAGCCAACCGACCCTTAC 

CGTTCGACATCTTTTGAATTGGTTGTGGATCGGGAGATACATCGCAAGAA 1950 
GCAAGCTGTAGAAAACTTAACCAACACCTAGCCCTCTATGTAGCGTTCTT 

GGCGCAGGTCCACCACGAAGCCTGTCTCAGTCGTTTGTAGGAGTCATGCG 2 0 00 
CCGCGTCCAGGTGGTGCTTCGGACAGAGTCAGCAAACATCCTCAGTACGC 

CAGACTGC AGTTCTCTATCATGAAATTCATC AGCTCCATCTAATGGGTAT 2 050 
GTC TGAC GTC AAGAGATAGTAC TTTAAGT AGTCGAGGTAGATT ACC C ATA 

C AC TGGAAACTTC GAAGAC AC CT ATGAAC CATC CGC C CTAAC GGGC CTCT 2100 
GTG AC C TTTG AAGC TTC TGTGG AT ACTTGGTAGGCGGG ATTGC C C GGAGA 

TCATGTCCCTCTGTGCAGTAAACCTTGCTGAAGCTGCCGGAGC ATCAAAC 2150 
AGTACAGGGAGACACGTCATTTGGAACGACTTCGACGGCCTCGTAGTTTG 

GACGGACTTCCACGCGCCGTCATGGCTCAGATCTACATTTCTGCATCCAT 2200 
C TGCC TGAAGGTGCGCGGC AGTAC CG AGTCTAG ATGTAAAGACGT AGGTA 

CCAATGCCGTTTGGCTCTTCCGAACCTACTCGCACCATTCTTCTCGGGAT 22 50 
GGTTACGGCAAACCGAGAAGGCTTGGATGAGCGTGGTAAGAAGAGCCCTA 

ACTTTTTACGAAGAGCTCGAAGGCACGTGCGTCGAGCTCCGGAGC ACTCG 23 00 
TGAAAAATGCTTCTCGAGCTTCCGTGCACGCAGCTCGAGGCCTCGTGAGC 

GTGTCC C ATTTGTTATGGATCTTCC ATC C AGCGAC AAGAAAGTTC ATGTC 2350 
CACAGGGTAAACAATACCTAGAAGGTAGGTCGCTGTTCTTTCAAGTACAG 

AGATGCGAAAAGGTTGGAGCATGTGTTGAGCTCGAAGCAGAAGCAGTTGA 2400 
TCTACGCTTTTCCAACCTCGTACACAACTCGAGCTTCGTCTTCGTCAACT 
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GATTTGGGTCTTTTGTGGAAGATGAGCAATTATCCCCACTTGCTCGAATC 2450 
CTAAACCCAGAAAACACCTTCTACTCGTTAATAGGGGTGAACGAGCTTAG 

CGAACAACGCTGAAAGTGTACCTACTCTCCAAACTTGTACAGGAACTTGT 2 50 0 
GCTTGTTGCGACTTTCACATGGATGAGAGGTTTGAACATGTCCTTGAACA 

C GGTGGTGAC GAGATCTTT AC AAAAAATGTGGAACGC ATC CTAAATGAC A 2 550 
GCCACCACTGCTCTAGAAATGTTTTTTACACCTTGCGTAGGATTTACTGT 

ATGAC C GTC TC G ATGATG AAGT AGACGTGGTTG ATGTTTC AAGAC TTTTG 2 60 0 
TACTGGCAGAGCTACTACTTCATCTGCACCAACTACAAAGTTCTGAAAAC 



GTGACAATTTCAACGCAGTGCGCTGCCATTTTGACTAATGAGAAGGATGA 
CACTGTTAAAGTTGCGTCACGCGACGGTAAAACTGATTACTCTTCCTACT 



CATGGTGGACGCACGTTCTGACATGTGGAATCTATTGGAGGAGTAACAAG 
GTACCACCTGCGTGCAAGACTGTACACCTTAGATAACCTCCTCATTGTTC 



CTATTCGCATCGAAACCCTACTGGACACAAAGCTTCAAGGGACAATCCAC 
GATAAGCGTAGCTTTGGGATGACCTGTGTTTCGAAGTTCCCTGTTAGGTG 



2650 



GTCAGCGAAATTCGGAACCTGGATCTCTCGAAACGGAGATGCTTGTTGCA 27 00 
C AGTC GC TTT AAGC C TTGGAC CTAG AG AGC TTTGC CTC TAC G AAC AAC GT 



2750 



AATGAGCTGGCACGGCAACACTATTCACTGATCAGGAACTGTCCGCCGAA 2 800 
TTACTCGACCGTGCCGTTGTGATAAGTGACTAGTCCTTGACAGGCGGCTT 

GATTTTGAC AGAC AATCTGGGTTTGGCGGTTGGCCACGCGTTGTGTGCTC 2850 
CTAAAACTGTCTGTTAGACCCAAACCGCCAACCGGTGCGCAACACACGAG 

GCAAGATTTGCATAGATGACCGAGATTCCCCGAAAGTCAGTCAATACGTG 290 0 
CGTTCTAAACGTATCTACTGGCTCTAAGGGGCTTTCAGTCAGTTATGCAC 

TGC ATTC AC ACAAAGAAGTCGCTCGAATCCCTCCGACTATTCTCCACATC 2950 
ACGTAAGTGTGTTTCTTCAGCGAGCTTAGGGAGGCTGATAAGAGGTGTAG 

ATCGCGAGCATCAGGTGTGGTGTCTGGAATTCAGGAAGGTAC ACGCCGAA 3000 
TAGCGCTCGTAGTCCACACCACAGACCTTAAGTCCTTCCATGTGCGGCTT 

TGGCCTACGAATGGATTATGAACTCGCTGCTCGACGCGTGGCGTTCCAAT 3 050 
ACCGGATGCTTACCTAATACTTGAGCGACGAGCTGCGCACCGCAAGGTTA 



3100 



GTTTAGTACGCTTTATC AAGAGGCGTATAATC ATTATGCGATTATTAATG 3150 
CAAATCATGCGAAATAGTTCTCCGCATATTAGTAATACGCTAATAATTAC 

GGAC AAGGGGAGATTGTTGGAGACTATTTGTCTACGAGCTC ACGTGCCGA 3200 
CCTGTTC C C C TC TAAC AAC CTC TGATAAAC AGATGCTCGAGTGC ACGGC T 
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ATGCTCAACGGAGCCAACCCACAAGCCACGTGGTCAGGCGYCCGACGCGT 3250 
TACGAGTTGCCTCGGTTGGGTGTTCGGTGCACCAGTCCGCRGGCTGCGCA 

TCGATCTACAAAAATGGACGCGGTCCGAGGAAGAGTGAGCATGCGACGCT 33 00 
AGCTAGATGTTTTTACCTGCGCCAGGCTCCTTCTCACTCGTACGCTGCGA 

CGGCTCAACCGGACGCATTTCATCTTCATACACTGGTTAAACTACATACT 3 3 50 
GCCGAGTTGGCCTGCGTAAAGTAGAAGTATGTGACCAATTTGATGTATGA 

TC TATGG ATC TTTGAATTGAAC AAAAAATGATTTTATTC AGAATAATGAT 3400 
AG AT AC C TAGAAACTTAAC TTGTTTTTTAC T AAAATAAGTC TT ATT AC T A 

AAATACGATTATATATAAA 
TTTATGC T AAT ATATATTT 
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MNEEFEGDVPMSDPFLSLVTKLDDIAPFP3SINDPLDFDMEHNWQEPGPSQQ 5 0 

PDPSIPGNQHSPPQEYYDIDGQRDVSTLHSLL1OTNNDDFFSMRFSPPNFD 10 0 

LGGGRGPSLAATQQLSGEGPASMLNPLQTSPPSGGYPPADAYRPLSLAQQ 150 

LAAPAMTPHQAASLFVNTNGIDQKNFTHAMLSSPHHTSMTSQPYTEAMGH 2 00 

INGYMSPYDQAQGPSGPSYYSQHHQSPPPHHHHHHPMPKIHENPEQVASP 2 50 

SIEDAPETKPTHLVEPQSPKSPQNMKEELLRLLVNMSPSEVERLKNKKSG 3 0 0 

ACSATNGPSRSKEKAAKIVIQETAEGDEDEDDEDSDSGETMSQGTTIIVR 3 50 

RPKTERRTAHNLIEKKYRCSINDRIQQLKVLLCGDEAKLSKSATLRRAIE 400 

H I EE VEHENQVLKHHVEQMRKTLQNNRLP YPEP I QYTE YS ARS PVE S S PS 450 

P PRJSIERKRS RMSTTTPMKNGTRDGS SKVTLFAMLLAVL I FNP I GLLAG S A 500 

IFSKAAAEAPIASPFEHGRVIDDPIX2TSTRTLFWEGSII 550 

MI I YVWKLLIHGDPVQDFMSVSWQTFVTTREKARAELNSGNLKDAQRKF 600 

CEC LATLDRSL P S PGVDSVF S VGWECVRHLLNWLWI GRY I ARRRRSTTKP 650 

VSWCRSHAQTAVLYHEIHQLHLMGITGNFEDTYEPSALTGLFMSLCAVN 700 

LAEAAGAS3STDGLPRAVMAQIYISASIQCRLALPNLLAPFFSGYFLRRARR 7 50 

HVRRAPEHSVSHLLWIFHPATRKFMSDAKRLEHVLSSKQKQLRFGSFVED 800 

EQL S PLARIRTTLKVYLL SKLVQELVGGDEI FTKNVERI LNDNDRLDDEV 850 

DVVDVSRLLVTISTQCAAILTN^ 900 

CGIYWRSNKNELARQHYSLIRNCPPKILTDNLGLAVGHALCARKICIDDR 95 0 

DSPKVSQYVCIHTKKSLESLRLFSTSSRASGWSGIQEGTRRMAYEWIMN 1000 

SLLDAWRSNLFASKPYWTQSFKGQSTFSTLYQEAYNHYAIINGTRGDCWR 1050 

LFWELTCRMLNGANPQATWSGXRRWSTKMDAVRGRVSMRRSAQPDAFH 1100 
LHTLVKLHTSMDL 
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CGGCACGAGGATTAATGCTGATTTCTGGTCTGGACTACACAGC ATTGCTG 5 0 
GCCGTGCTCCTAATTACGACTAAAGACCAGACCTGATGTGTCGTAACGAC 

GTATAAGGAGTCGGGAC C AG AGGAGT AAGATTTCGGG AAGGAATC C CGTC 100 
CATATTCCTCAGCCCTGGTCTCCTCATTCTAAAGCCCTTCCTTAGGGCAG 

CGGTAGGGACTACTAGCATTCGCAAGTGACGTCCAGCAACCGGAGGACCC 15 0 
GCCATCCCTGATGATCGTAAGCGTTCACTGCAGGTCGTTGGCCTCCTGGG 

CCAACTGTAGAATCCGCATCACCATCCTAATCCCAACAAACCAATGACAT 2 00 
GGTTGACATCTTAGGCGTAGTGGTAGGATTAGGGTTGTTTGGTTACTGTA 

CTTGAGACCTCACCAGCCATGGATCCCTTCGTGTTCTTCATAGTACTGGC 250 
GAACTCTGGAGTGGTCGGTACCTAGGGAAGCACAAGAAGTATCATGACCG 

ATCGCTTTATGGCGTTCTTTACTTTTTCGACCGCTTCTTCAAGAGTTGCA 3 00 
TAGCGAAATACCGCAAGAAATGAAAAAGCTGGCGAAGAAGTTCTCAACGT 

TGCACTACCCGTACGATGCCTTCCTCAAGAACACCGGGCTGAGTATAAAT 3 50 
ACGTGATGGGC ATGCTACGGAAGGAGTTCTTGTGGCC CGAC TC ATATTTA 

TTC ATGAGCCTCCACTGGCACACGAGTGCCTTTAACAGGACCCTCCTACG 400 
AAGTACTCGGAGGTGACCGTGTGCTCACGGAAATTGTCCTGGGAGGATGC 

CTGGGGATCTGCCGGTAACAGCTGCACCCGGAGAGTAATGATCACCAGCT 450 
GAC C C C TAG AC GGC C ATTGTCGACGTGGGCCTCTC ATTAC TAGTGGTCG A 

TTAATGTAGGAGTCCTGGTCACCTTTTCTCTGCTCCCGATCGGTCTGATC 500 
AATTAC ATCCTC AGGAC C AGTGGAAAAGAGACGAGGGC TAGC C AG AC TAG 

CTGCTCATTGCCACTATCTTCAGCAGTGGTGAACAAGAC AGCTCTTCGTC 550 
GACGAGTAACGGTGATAG AAGTCGTC AC CACTTGTTC TGTCGAGAAGC AG 

TGTATCCTCGCCCGTTGGAGTCCCTGTGCAGCTGGAAATTCTACTGCCCG 600 
AC ATAGGAGCGGGC AACC TC AGGGAC AC GTCGAC C TTT AAGATGAC GGGC 

GCGTCAACTTGCCGTTGGAGGAGATCGGATACTACATCACAACCCTTGTG 650 
CGCAGTTGAACGGCAACCTCCTCTAGCCTATGATGTAGTGTTGGGAACAC 

CTCTGCTTGGTGGTGCACGAGATGGGACACGCCCTGGCCGCTGTGATGGA 700 
GAGACGAACCACCACGTGCTCTACCCTGTGCGGGACCGGCGACACTACCT 

GGATGTGC C TGTC AC CGGGTTTGGAATAAAGTTC ATCTTC TGC CTGC C GT 7 50 
C CTAC ACGGAC AGTGGC CC AAACCTT ATTTC AAGTAGAAG ACGGAC GGC A 

TAGCATACACGGAGCTCTCCCACGACCACTTAAACAGTCTACGTTGGTTC 800 
ATCGTATGTGCCTCGAGAGGGTGCTGGTGAATTTGTCAGATGCAACCAAG 
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CGC AAGCTACGTGTTCTGTGCGCTGGAATCTGGCATAATTTTGTGTTCGC 850 
GCGTTCGATGCACAAGACACGCGACCTTAGACCGTATTAAAACACAAGCG 

TGGCGTGTGCTATCTCTTAATCTCAACGGTGGGAATCACTATGTCACCTT 9 00 
AC CGC AC ACGAT AGAG AATTAGAGTTGC C AC C C TTAGTGAT AC AGTGG AA 

TGTACGCTTACAACCAACACGTAGTGGTCACTGAACTAACAAGGAAATCC 950 
ACATGCGAATGTTGGTTGTGCATCACCAGTGACTTGATTGTTCCTTTAGG 

C C GC TG AGGGG AG AGCGC GGC TTGC AAGTGG AC AATC AAATAAC C C AAGT 1000 
GGCGACTCCCCTCTCGCGCCGAACGTTCACCTGTTAGTTTATTGGGTTCA 

AAACGGCTGCCCAGTAAACAGCGAGGAGAGTTGGGTGACATGCCTGC AGA 1050 
TTTGCCGACGGGTCATTTGTCGCTCCTCTCAACCCACTGTACGGACGTCT 

ACTCTCTGAAGCTCAAGCCGGGCTACTGTGTGAGTGCGGACTTCGTGCAG 1100 
TGAGAGACTTCGAGTTCGGCCCGATGACACACTCACGCCTGAAGCACGTC 

CTTAACGACGAAAGCAGCGCCATCTCACATCATAGCATTGATGGTCAGCT 1150 
GAATTGCTGCTTTCGTCGCGGTAGAGTGTAGTATCGTAACTACCAGTCGA 

ACAGTGCTGTGATGAACTAAATCCGAACGTAAGCTGCTTCGAGGTGGTGG 12 00 
TGTCACGACACTACTTGATTTAGGCTTGCATTCGACGAAGCTCCACCACC 

AGGACGC AAATGGAGATGTGCCGGTGGAGCTGCCGC AGC ATGTATGTCTC 1250 
TC C TGCGTTTAC C TC TAC AC GGCC ACC TCGACGGCGTCGTAC ATAC AGAG 

AATGTGCGCCGCACTTTGGAGGAGGTCTCCGAGC ACTGCTCGTCCGGAGT 13 00 
TTACACGCGGCGTGAAACCTCCTCCAGAGGCTCGTGACGAGCAGGCCTCA 

TTGC AACGAGGG ATTCTGC CTACG ACC GC TTATACGAAATATC AC TGC C A 13 50 
AACGTTGCTC C CTAAGAC GGATGC TGGCGAATATGCTTTAT AGTGACGGT 

TAATGACGTTCAAGCGACAGAATTTTCGCGGAGAGAAGCTGCCGCCGGTG 1400 
ATTACTGCAAGTTCGCTGTCTTAAAAGCGCCTCTCTTCGACGGCGGCCAC 

ATC TATGTGGGC C ATC C ATGGGATGTC AC TCGAACTGTGGAGGTATCCGC 1450 
TAGAT AC AC C CGGTAGGT ACCC TAC AGTGAGC TTG AC AC CTCC ATAGGCG 

CTTTGTGCCGAGATATAGCTTATTAAAGGC AGCCTGGCCGGATGCCTGGC 1500 
GAAAC ACGGCTC TAT ATCGAATAATTTC CGTCGGACCGGCCT ACGGAC CG 

TGCTGCTCCTCAAGTATAACGTGGTCTTC AGCATAGGATTGGCGTTGATC 1550 
ACGACGAGGAGTTCATATTGCACCAGAAGTCGTATCCTAACCGCAACTAG 

AATGCCATTCCCTGCTTTGGTTTCGATGGCGCCCACATTACCAGCACCGT 1600 
TTACGGTAAGGGACGAAACCAAAGCTACCGCGGGTGTAATGGTCGTGGCA 
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GATAC AC AGCTTCTTGGTGGGCAGAGTGGATCAGCATGCCAAGAGAGATA 1650 
CTATGTGTCGAAGAACCACCCGTCTCACCTAGTCGTACGGTTCTCTCTAT 

TC ATCTCGTTGATAATCACCAGCGTGGGTTCCCTTCTCTTTGCACTGGCC 1700 
AGT AGAGC AAC TATTAGTGGTCGC ACCC AAGGG AAGAGAAACGTGACC GG 

CTGCTTAAGGTGGCCTGGTTGAGTTTTCTGCGACCCCTGCTTTAAGAACT 17 50 
GACGAATTCCACCGGACCAACTCAAAAGACGCTGGGGACGAAATTCTTGA 

GAAATGGAAAACTGAAATGGATCCTGGGAGTTCAACTCCCTGCAAAGACG 1800 
CTTTACCTTTTGACTTTACCTAGGACCCTCAAGTTGAGGGACGTTTCTGC 

CTAGACTGC T ATTTCACCTTC ACGAAAC AC ACAAAAAC AC AGC GAATTGT 1850 
GATCTGACGATAAAGTGGAAGTGCTTTGTGTGTTTTTGTGTCGCTTAACA 

AGCACCTCAAAGATTCGATAGCTTTTTGTCATAGTCCTTAGTCTTAACTC 1900 
TCGTGGAGTTTCTAAGCTATCGAAAAACAGTATCAGGAATCAGAATTGAG 

GTATTTATTTTCGTACGGTTGTCGAGCTC AAAAATAAAATCAAATTAAGC 1950 
CATAAATAAAAGCATGCCAACAGCTCGAGTTTTTATTTTAGTTTAATTCG 

TAAAAAAAAAAAAAAAAAAAC 
A.TTTTTTTTTTTTTTTTTTTG 
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MDPFVFFIVTjASLYGVXjYFFDRFFKSCMHYPYDAFLKNTGLSINFMSLHW 5 0 

HTSAFmTLLRWGSAGNSCTRRVMITSFNVGVLVTFSLLPIGLILLIATI 100 

FSSGEQDSSSSVSSPVGVPVQLEILLPGVNLPLEEIGYYITTLVLCLWH 150 

EMGHALAAVMEDVPVTGFGIKFIFCLPLAYTELSHDHLNSLRWFRKLRVL 2 0 0 

C AG I WHNF VFAGVC YLL I STVG I TMSPLYAYNQHVVVTELTRKS PLRGER 2 50 

GLQVDNQITQVNGCPVNSEESWVTCLQNSLKLKPGYCVSADFVQLNDESS 3 00 

AISHHSIDGQLQCCDELNPNVSCFEVVEDANGDVPVELPQHVCLNVRRTL 350 

EEVSEHCSSGVCNEGFCLRPLIRNITAIMTFKRQNFRGEKLPPVIYVGHP 400 

WDVTRTVEVS AFVPRYSLLKAAWPDAWLLLLK YISTWE S I GLAL I NAI PC F 450 

GFDGAHITSTVIHSFLVGRVDQHAKRDIISLIITSVGSLLFALALLKVAW 50 0 
LSFLRPLL 
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GTGTGCCTGACTGTTTTGTAGGTGTAAGGAGGGGCGTGGCCAAATAGTTT 5 0 
CACACGGACTGACAAAACATCCACATTCCTCCCCGCACCGGTTTATCAAA 

TTGGTATACGGATAGAATTTGGATGAAAAATAAAACGAAATCAAAACATT 100 
AACCATATGCCTATCTTAAACCTACTTTTTATTTTGCTTTAGTTTTGTAA 

TTTCAAAAGCGTGGAAGTTTTGGCCGGCTTGTGGGCATGGCAAAACGTTT 150 
AAAGTTTTCGCACCTTCAAAACCGGCCGAACACCCGTACCGTTTTGCAAA 

TTTGGCTATCCGTTAATCAACATACCGTTGCCCGGGACAATACCCACCAA 2 00 
AAACCGATAGGCAATTAGTTGTATGGCAACGGGCCCTGTTATGGGTGGTT 

GATCGTTGTACCCTACGAAACTGGATCCGGATCGCTGTCATGGCACTCTC 2 50 
C T AGC AAC ATGGG ATGC TTTGAC C TAGGC CTAGCG AC AGT AC CGTGAG AG 

TTAATACATCCTCGACTACACCGCAGGAACCGCACCCTTCCGGCGAACCC 3 00 
AATTATGTAGGAGCTGATGTGGCGTCCTTGGCGTGGGAAGGCCGCTTGGG 

TGGCCCCCCGAACCACAGGTACTCAATAGCAGTACCACGGACCGCAGCCC 3 50 
ACCGGGGGGCTTGGTGTCCATGAGTTATCGTCATGGTGCCTGGCGTCGGG 

GCCTCCCCTTCTGCCCTGGGCGCAGAGCAGCCCCGCCTTTTTCTACGTCC 400 
CGGAGGGGAAGACGGGACCCGCGTCTCGTCGGGGCGGAAAAAGATGCAGG 

AGCAGATTACTCTGCGAACCAGTGTTCTCCCGTGGACGGAGGGAATGCAG 45 0 
TCGTCTAATGAGACGCTTGGTCACAAGAGGGCACCTGCCTCCCTTACGTC 

CTTATGGATGCGTTTCGTGCGCCGCTACACGAAGTTTTTAAATTGCTTGA 50 0 
GAATACCTACGCAAAGCACGCGGCGATGTGCTTCAAAAATTTAACGAACT 

AATTGTGCGCAATCACCAGAGC AGCGAAAACAAACGTACCCTGGAGCACA 550 
TTAACACGCGTTAGTGGTCTCGTCGCTTTTGTTTGCATGGGACCTCGTGT 

ACTGCCTACATGTAGACAACGTAAAGCGCGGAACACACGGGC AGCTGGAC 600 
TGACGGATGTACATCTGTTGCATTTCGCGCCTTGTGTGCCCGTCGACCTG 

CAGATCTTTCCGGAGTATGGCTGCCTGCTGCTCTCGCCCGCCAACCTGTG 650 
GTCTAGAAAGGCCTCATACCGACGGACGACGAGAGCGGGCGGTTGGACAC 

GACGCAGAACTCTCAGAACTTTACTCGGGACACAAACATCCTGAACACGA 7 00 
C TGCGTCTTGAGAGTCTTGAAATGAGC CCTGTGTTTGTAGGACTTGTGC T 

TATTTC AGTACCATAACCTACAGAAATCAAAAGTTTCCGCGGCGGAAATG 750 
ATAAAGTC ATGGTATTGGATGTCTTTAGTTTTC AAAGGCGC CGC CTTTAC 

CTGTTTGGATTACCCATGCAGGACACTGGATTCAAGCGCTATCCATTGCG 800 
GACAAACCTAATGGGTACGTCCTGTGACCTAAGTTCGCGATAGGTAACGC 
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CGCTCGGTCGCGTATTATACAGTATGCCTTGACGTTATTCCTCAAGCACA 850 
GCGAGCCAGCGCATAATATGTCATACGGAACTGCAATAAGGAGTTCGTGT 

ACGATATGGAGTATCTGGACACTCTAAAGGAAAAGCTGCTGCGACACTAT 900 
TGCT AT AC CTC ATAG ACCTGTGAGATTTC CTTTTCG ACGAC GC TGTGATA 

CCCCCACTCCCGTTGGCTAGTGCGTCGGCTGAAGAGCCGACGACCATAAC 95 0 
GGGGGTGAGGGCAACCGATCACGCAGCCGACTTCTCGGCTGCTGGTATTG 

TTACATCTTTTATCCAGGAGAGTACAGGATGTGGGAGCTGGTGCCTTACA 1000 
AATGTAGAAAATAGGTCCTCTCATGTCCTACACCCTCGACCACGGAATGT 

CAGTGGCCTTTATGTTGGTGTTTGCTTATGTGTACTTCTCTGTTCGAAAA 1050 
GTCACCGGAAATACAACCACAAACGAATACACATGAAGAGACAAGCTTTT 

ATCGATGTATTTCGTTCCCGCTTTTTGCTGGCCTTATGTAGCGTAATCAC 1100 
TAGCTACATAAAGCAAGGGCGAAAAACGACCGGAATACATCGCATTAGTG 

CACAGCCGGGAGCTTGGCCATGTCCCTTGGCTTGTGTTTCTTCTTTGGCC 1150 
GTGTCGGCCCTCGAACCGGTACAGGGAACCGAACACAAAGAAGAAACCGG 

TGACAATTTCGCTGCAGTCAAAGGACATTTTCCCCTACCTTGTAATCCTT 12 00 
ACTGTTAAAGCGACGTCAGTTTCCTGTAAAAGGGGATGGAACATTAGGAA 

GTGGGATTGGAAAATAGCTTGGTGATC ACAAAGAGCGTAGTCTCAATGGA 12 5 0 
CACCCTAACCTTTTATCGAACCACTAGTGTTTCTCGCATCAGAGTTACCT 

CGAGACATTCGACGTGAAGATCCGCGTGGCGCAGGCTCTTAGCAAGGAGG ' 13 00 
GCTCTGTAAGCTGCACTTCTAGGCGCACCGCGTCCGAGAATCGTTCCTCC 

GTTGGCATATATCCAAGACTCTTTTGACGGAGATAACAATTTTGACAATT 13 50 
CAACCGTATATAGGTTCTGAGAAAACTGCCTCTATTGTTAAAACTGTTAA 

GGTCTTGCTACTTTCGTGCCCGTCATCCAGGAGTTTTGTATCTTTGCCAT 1400 
CCAGAACGATGAAAGCACGGGCAGTAGGTCCTCAAAACATAGAAACGGTA 

AGTCGGCTTGCTTTCCGATTTTATGCTACAGATGCTGCTCTTCTCAACAA 1450 
TCAGCCGAACGAAAGGCTAAAATACGATGTCTACGACGAGAAGAGTTGTT 

TACTGGC C ATGAAC ATTAAGCGGACCGAGTAT ACGGCGG AGGC C AAGC AC 1500 
ATGACCGGTACTTGTAATTCGCCTGGCTCATATGCCGCCTCCGGTTCGTG 

CTTCCTAAGATGTTGCTGAGCTGCACCCAAGGGGCTGGTCGACAGGATTT 1550 
GAAGGATTCTAC AACG AC TCGACGTGGGTTC CCCGAC C AGCTGTC C TAAA 

CCGATTTTTCGGGGCCGCCCCAGC ACTGCCACCGTTTGTCCCTGGCACAT 1600 
GGCTAAAAAGCCCCGGCGGGGTCGTGACGGTGGCAAACAGGGACCGTGTA 
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TTCAGCGTTCTCAGTCGCATCCAAAACTGTGTTTTGCTGATCCCGCATCT 
AAGTCGCAAGAGTCAGCGTAGGTTTTGACACAAAACGACTAGGGCGTAGA 



AAACAGTACAACATCTCACTAAGTGGGCACTACGTTACCCTGCTACCGAC 
TTTGTCATGTTGTAGAGTGATTCACCCGTGATGCAATGGGACGATGGCTG 



GATGGTAATGGCAGAGGGGTTGCCTCTGGTTCCCAAGAGCCCCATGGAAA 
CTACCATTACCGTCTCCCCAACGGAGACCAAGGGTTCTCGGGGTACCTTT 



1650 



GTTAGCGATCGTACAAGCTTGGTTAATGGACACTCGTCGCCGGAGCAACG 17 00 
CAATCGCTAGCATGTTCGAACCAATTACCTGTGAGCAGCGGCCTCGTTGC 

AATACCC AAACGCATAAAGATTGTAAATTTCTGGGCGCGGACTCGCTTTT 175 0 
TTATGGGTTTGCGTATTTCTAACATTTAAAGACCCGCGCCTGAGCGAAAA 

TTCAGCGTGCCTTCATGATCTGGATGATTGTGTGGATATGCTCTATAGTT 18 00 
AAGTC GC AC GG AAGTAC T AG ACC T AC TAAC AC AC C TAT AC G AG AT ATC AA 

TATAATTCGGGATATCTGGAGCAGTTGTTTAGCATGCAGAGCAACGGCAC 1850 
ATATTAAGCCCTATAGACCTCGTCAACAAATCGTACGTCTCGTTGCCGTG 

AATGACGGCAACCCTTGAACTTCAACGGCGACTAC AGGCGGGTCGGGGAG 19 00 
TTACTGCCGTTGGGAACTTGAAGTTGCCGCTGATGTCCGCCCAGCCCCTC 

C AGTCAGCAGTTTTTTCGAGGGATGGCAAGCGGACGGGC AGCGTGCCACG 1950 
GTCAGTCGTCAAAAAAGCTCCCTACCGTTCGCCTGCCCGTCGCACGGTGC 

AGTGCGCCAAGCGGAAGCGGCTTTTCTACGCCAATAAAAGCTCCTCTAGC 2 000 
TCACGCGGTTCGCCTTCGCCGAAAAGATGCGGTTATTTTCGAGGAGATCG 

GATC G AT ATAAACG AAACGGCCG AGGAAATGATGAGAC TTC G AT ATC C C A 2050 
C T AGC TATATTTGCTTTGC CGGC TCCTTTACTACTC TG AAGCT ATAGGGT 

GCTTCGACCTAAACTATTTCCTTTCAAACTTCCACTGGTCC ACGATTATG 210 0 
CGAAGCTGGATTTGATAAAGGAAAGTTTGAAGGTGACCAGGTGCTAATAC 



2150 



CATTCGCCTTAGTCATGCCATCGCTCCGGAGCTAGCCACTCTGTTGCGGA 22 00 
GTAAGCGGAATCAGTACGGTAGCGAGGCCTCGATCGGTGAGACAACGCCT 

ATCCGCAGGAGCAGCTGC AACAAAATTTTCAATGGAAGGCCCTAGCCGCT 22 50 
TAGGCGTCCTCGTCGACGTTGTTTTAAAAGTTACCTTCCGGGATCGGCGA 

GCACTCGATCCGCTGGACTTTAACGATGACGACGTGCGCCGTGAGTCTCC 23 00 
CGTGAGCTAGGCGACCTGAAATTGCTACTGCTGCACGCGGCACTCAGAGG 



2350 



TATTTTTCGCCATCCTCTTGTGCTGCATCAGCATCTTCGTGCTTTGCTAC 2400 
ATAAAAAGCGGTAGGAGAACACGACGTAGTCGTAGAAGCACGAAACGATG 
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ACGATGGTGGTTTTCTACCGCTGCATATGTACCAGGAACTATGCCGAGTG 2450 
TGCTACCACCAAAAGATGGCGACGTATACATGGTCCTTGATACGGCTCAC 

GCGCTCCAGTTGGCACGAATCTGAGGCACCGTACAAGCAGACTGAGCAAA 2 500 
CGCGAGGTCAACCGTGCTTAGACTCCGTGGCATGTTCGTCTGACTCGTTT 

TCCTGGAGGGAGTTCCAACGCAAATCGCCGGACACAAACATCGCATTGAA 2 550 
AGGACCTCCCTCAAGGTTGCGTTTAGCGGCCTGTGTTTGTAGCGTAACTT 

TGCCTGGTGTCTGACGGCGCCTACATAATCAGCTGCTGCCTTAAAGGCCA 2 600 
ACGGACCACAGACTGCCGCGGATGTATTAGTCGACGACGGAATTTCCGGT 

AATCCGAGTGTGGGATGCACGCAGTGGCGAGCAGCTAACCAGC ATCTCCC 2 650 
TTAGGCTCACACCCTACGTGCGTCACCGCTCGTCGATTGGTCGTAGAGGG 

GATCCGATATTCAGATCTCTCAGCAGCGGACGGATGGGCAGACGCTGGTA 27 00 
CTAGGCTATAAGTCTAGAGAGTCGTCGCCTGCCTACCCGTCTGCGACCAT 

C G AAAGC TGGC C GTGTC AC CGGTC TGGTGC CTTG AC T AC TTCG AT AATC T 27 50 
GCTTTCGACCGGCACAGTGGCCAGACCACGGAACTGATGAAGCTATTAGA 

AATCGCAGTAGGCTGCGCCAACGGCCGCGTAGAATTGTGGGAATCCCCTG 2800 
TTAGCGTCATCCGACGCGGTTGCCGGCGCATCTTAACACCCTTAGGGGAC 

C GGG ATTGCTTAAGTGTGC AT ACC AGG AAGACGC G AAGAGAAAC C AGGGT 2 850 
GCCCTAACGAATTCACACGTATGGTCCTTCTGCGCTTCTCTTTGGTCCCA 

ATAACC C AC ATC C ACC TGAACGGC GATC GAGTGATTGTGGC GCGTC TT AA 2 900 
T ATTGGGTGTAGGTGG AC TTGCC GC T AGC TC ACTAAC AC CGCGC AGAATT 

TGGCCGACTAGATTTTTACCGCTTAGAGACGTACTACAAGGGGAAGCAAA 2 950 
ACCGGCTGATCTAAAAATGGCGAATCTCTGCATGATGTTCCCCTTCGTTT 

TCGACTGGGGTTTTACCTCGGCTTACAGGAGAACTCATGTTCGAACTGGA 3 000 
AGCTGACCCCAAAATGGAGCCGAATGTCCTCTTGAGTACAAGCTTGACCT 

TCCACTGGAAGCCTGGGATTAATGTTGCAGCAGCAGCGCTGTCAGCAAGA 3050 
AGGTGAC CTTCGG ACC CTAATTAC AACGTCGTCGTC GCGAC AGTCGTTC T 

AGC ATC CC AGAAGAC C AC C AAGGAGGAAATGAAAATC AC ATTGGAGGGTG 3100 
TCGTAGGGTCTTCTGGTGGTTCCTCCTTTACTTTTAGTGTAACCTCCCAC 

T AAG ACT AGC C CATC AGC AGCC AATC AC ATGCATGCAGGTCGTTAACGAC 3150 
ATTCTGATCGGGTAGTCGTCGGTTAGTGTACGTACGTCCAGCAATTGCTG 

ATGGTTTTCACTGGCAGCCAGGATCACACCCTCAAGGTGTATTGCCTCAA 3200 
TACCAAAAGTGACCGTCGGTCCTAGTGTGGGAGTTCCACATAACGGAGTT 
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TAAGTCGGATGTTGAGTATACGCTCCACGGTCACTGTGGGCCTGTAACCT 32 50 
ATTC AGC C TAC AAC TC ATATGCGAGGTGC C AGTGAC ACC CGGAC ATTGGA 

GTCTCTTTGTGGATCGCTGGCAACCTGGCACAGGGGGGTCTGGGTCCCAG 3300 
CAGAGAAACACCTAGCGACCGTTGGACCGTGTCCCCCCAGACCCAGGGTC 

GACGGCCTGCTCTGCGTATGGGATCTGTTCACGGGAGCCTGCATGTATAA 33 50 
C TGC CGG ACGAGAC GC ATAC C C T AG AC AAGTGC C C TCGGAC GT AC AT ATT 

TATACAAGCTCACGACGGAGCCGTCAGCTGCCTGGCCTGTGCGCCCAGTT 3400 
ATATGTTCGAGTGCTGCCTCGGCAGTCGACGGACCGGACACGCGGGTCAA 

ACGTAATCTCGCTAGGCACGGACGAGAGGATTTGCGTATGGGAACGATTT 3450 
TGCATTAGAGCGATCCGTGCCTGCTCTCCTAAACGCATACCCTTGCTAAA 

C AGGG AAACC TGTTG ACT AC CATC AAC ATC TC AAAC GC AT ACTC GAGC C T 3500 
GTCCCTTTGGACAACTGATGGTAGTTGTAGAGTTTGCGTATGAGCTCGGA 

AC TG ATGC T AAC AC CGTC AC TATTGGTTACGAGC AAAATGGGT AAGGCC T 3 550 
TGACTACGATTGTGGCAGTGATAACCAATGCTCGTTTTACCCATTCCGGA 

CATTCTTGATTGCCAATATAAGAGGGACAGTAAATAATAAATTTAATTCC 3 600 
GTAAGAACTAACGGTTATATTCTCCCTGTCATTTATTATTTAAATTAAGG 

AAC AC AGGATCTCTTATTGTGTGGGATGTGCGCACTGGGCAGCCGGCTCG 3 650 
TTGTGTCCTAGAGAATAACACACCCTACACGCGTGACCCGTCGGCCGAGC 

CGAGGTC AAAC TGG AC TTTGC AAAC CTGC AGC TC TGTC C C AAAATAATG A 37 00 
GCTCCAGTTTGACCTGAAACGTTTGGACGTCGAGACAGGGTTTTATTACT 

TGCTTGCCTGCGATTCGGTAGTTTGCGACTACGGAAATGAGATCCGCGTC 3750 
ACGAACGGACGCTAAGCCATCAAACGCTGATGCCTTTACTCTAGGCGCAG 

GTCCGCTTTCCTATCGTGGCAGACAAGTGCCATTAAAGCGC AAAATTTTA 3800 
CAGGCGAAAGGATAGCACCGTCTGTTCACGGTAATTTCGCGTTTTAAAAT 

ATTTAGCGTGGTTCGCTAGCACCTAGGAATAAGTTGACTTAAGGCTTTAA 3 850 
TAAATCGCACCAAGCGATCGTGGATCCTTATTCAACTGAATTCCGAAATT 

AACGCCTGGAAGTCATTGACGCATTC ACTATTTTATATAAATATATACAC 3900 
TTGCGGACCTTCAGTAACTGCGTAAGTGATAAAATATATTTATATATGTG 

TATTAGGGTCCGCAGCAACTTACGGTTTTAACAC AAGCTGTACGTATCTC 3 950 
ATAATCCCAGGCGTCGTTGAATGCCAAAATTGTGTTCGACATGCATAGAG 

ATCTCTAGAATTTTGTGTTAGTTTGTGGACACTAAGTGTAACAGCTACGC 4000 
TAGAGATCTTAAAACACAATCAAACACCTGTGATTCACATTGTCGATGCG 
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TCCGGTAGGTTAAGGAACTAAACTAAATGAATCAGATATATACACATATA 4050 
AGGCCATCCAATTCCTTGATTTGATTTACTTAGTCTATATATGTGTATAT 

TTTTCGCGTAATTATATAAACTACATAGTGTCTTAAAGCGCCTCAGCCTA 4100 
AAAAGCGCATTAATATATTTGATGTATCACAGAATTTCGCGGAGTCGGAT 

AT AT AAAATGAC T AAATGTT AAAATAAA 
TATATTTTACTGATTTACAATTTTATTT 
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MKNKTKSKHFSKAWKFWPACGHGKTFFGYPLINIPLPGTIPTKIWPYET 5 0 

GSGSLSWHSLNTSSTTPQEPHPSGEPWPPEPQVLNSSTTDRSPPPLLPWA 100 

QSSPAFFYVQQITLRTSVLPWTEGMQLMDAFRAPLHEVFKLLEIVRJSIHQS 150 

SENKRTLEHNCLHVD1WKRGTHGQLDQIFPEYGCLLLSPANLWTQNSQNF 2 0 0 

TRDTNILNTIFQYHNLQKSKVSAAEMLFGLPMQDTGFKRYPLRARSRIIQ 2 50 

YALTLFLKHNDMEYLDTLKEKLLRHYPPLPLASASAEEPTTITYIFYPGE 3 00 

YRMWELVPYTVAFMLVFAYVYFSVRKIDVFRSRFLLALCSVITTAGSLAM 350 

SLGLCFFFGLTISLQSKDIFPYLVILVGLENSLVITKSWSMDETFDVKI 400 

RVAQALSKEGWHISKTLLTEITILTIGLATFVPVIQEFCIFAIVGLLSDF 450 

MLQMLLFSTILAMNIKRTEYTAEAKHLPKMLLSCTQGAGRQDFRFFGAAP 500 

ALPPFVPGTFQRSQSHPKLCFADPASVSDRTSLVNGHSSPEQRIPKRIKI 550 

VNFWARTRFFQRAFMIWMIVWICSIVYNSGYLEQLFSMQSNGTMTATLEL 600 

QRRLQAGRGAVSSFFEGWQADGQRATSAPSGSGFSTPIKAPLAIDI3STETA 650 

EEMMRLRYPSFDLNYFLSNFHWSTIMKQYNISLSGHYVTLLPTIRLSHAI 700 

APELATLLRNPQEQLQQNFQWKALAAALDPLDFNDDDVRRESPW^/MAEGL 750 

PLVPKS PME IFFAILLCC I S I FVLC YTMVVF YRC ICTRNYAEWRS SWHES 800 

EAPYKQTEQ ILEGVPTQI AGHKHRIECLVSDGAYI I SCCLKGQ IRVWDAR 850 

SGEQLTSISRSDIQISQQRTDGQTLVRKLAVSPVWCLDYFDNLIAVGCAN 900 

GRVELWE S PAGLLKC AYQEDAKRNQG ITH IHLNGDRVIVARLNGRLDFYR 950 

LETYYKGKQIDWGFTSAYEIRTHVRTGSTGSIjGLMLQQQRCQQEASQKTTK 1000 

EEMKITLEGWLAHQQPITCMQVVNDMVFTGSQDHTLKVYCLNKSDVEYT 1050 

LHGHCGPVTCLFVDRWQPGTGGSGSQDGLLCVWDLFTGACMYNIQAHDGA 1100 

VS C L AC AP S YVI S LGTDERI C VWERFQGNLLTT IN I SNAYS SLLMLTPSL 1150 

LVTSKMGKASFLIANIRGTVNI^FNSNTGSLIVWDVRTGQPAREVKLDFA 12 00 
NLQLC PKIMMLACDSVVCDYGNE I RWRFP I VADKCH 
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GTTTATTAAGCTGC AAATATAC TCGTGAAAAAAATC AAAAC AAC C ATG AA 5 0 
CAAATAATTCGACGTTTATATGAGCACTTTTTTTAGTTTTGTTGGTACTT 

CAACAAGTGTTGCAACTATTACTAACTAGTCGCTAGTTTAAAGCAAAGTG 100 
GTTGTTCACAACGTTGATAATGATTGATCAGCGATCAAATTTCGTTTCAC 

CGTTGACATTAACCAGTTATGGAAAAACAAAAGCAC ACGTGAACTAAGAA 150 
GCAACTGTAATTGGTCAATACCTTTTTGTTTTCGTGTGCACTTGATTCTT 

AACAGATAGAAGGTGGTAAAGCATTCGCAATGGACACGACACTGATGAAC 2 0 0 
TTGTCTATCTTCCACCATTTCGTAAGCGTTACCTGTGCTGTGACTACTTG 

TTAATAGACGCTCCGCTGGACGAGTCCATGGATTTGTTCAAAGCGGAGGA 250 
AATTATCTGCGAGGCGACCTGCTCAGGTACCTAAACAAGTTTCGCCTCCT 

TGTCTTCGAACCGTTCGACGCCGACCTGCACTCGGACATGCTGGACATCA 300 
ACAGAAGCTTGGCAAGCTGCGGCTGGACGTGAGCCTGTACGACCTGTAGT 

TCCTCAACGACATGGACCTGGCGCCGACGCAGATGTACAACATGCTGCTG 3 50 
AGGAGTTGCTGTACCTGGACCGCGGCTGCGTCTACATGTTGTACGACGAC 

GACGAGCCTCGAACGCATACCCAGCAGACGCAGTCCGTGGATCAGCAGCC 400 
CTGCTCGGAGCTTGCGTATGGGTCGTCTGCGTCAGGCACCTAGTCGTCGG 

GCAATCCGTCGAGCAACAGCCGCACGTGAAAAGCGAGCACTCTTCGCCAG 450 
CGTTAGGCAGCTCGTTGTCGGCGTGCACTTTTCGCTCGTGAGAAGCGGTC 

TGCACATCAAGGAGGAACTGCATCAGCAGCAACAACAGTCGCCGCTTCTC 500 
ACGTGTAGTTCCTCCTTGACGTAGTCGTCGTTGTTGTCAGCGGCGAAGAG 

GTCTACAAACCAGATCCCCTC ATAGCCACAAGCTACAATTGTCCCCAGCA 550 
CAGATGTTTGGTCTAGGGGAGTATCGGTGTTCGATGTTAACAGGGGTCGT 

ACAGCCGACGGGCCTTTTGAAGGCCGCCCAACCAACAGCCACCATACATC 60 0 
TGTCGGCTGCCCGGAAAACTTCCGGCGGGTTGGTTGTCGGTGGTATGTAG 

ACATGGACGCCCAGCGGATGCCGCCGAACACGGCGGTGTATCCCCC ATCT 650 
TGTACCTGCGGGTCGCCTACGGCGGCTTGTGCCGCCACATAGGGGGTAGA 

CTGGGCAGTAGCTTTGTCTACCAGTCCATGTCCCCGCCCACGTCGCCGGT 700 
GACCCGTCATCGAAACAGATGGTCAGGTACAGGGGCGGGTGCAGCGGCCA 

GGAGTCTGCGAACC AGAATGTCAATGTC ATGCAGCCCGTTGCTGCAACTC 750 
CCTCAGACGCTTGGTCTTACAGTTACAGTACGTCGGGCAACGACGTTGAG 

CTGCTCCCGCTTCTGCTCCTTTGCCCCAGCAGTCGTATCCGCAACCCTTC 800 
GACGAGGGCGAAGACGAGGAAACGGGGTCGTCAGCATAGGCGTTGGGAAG 
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ATT ACGTAC AAC TC TAAGGC CGGAATGACTTCCGATGAAGC C ATGT AC TT 850 
TAATGCATGTTGAGATTCCGGCCTTACTGAAGGCTACTTCGGTACATGAA 

GCTCTTGCAGCCCACGGTAGCCAGTCCAACCCCATCTCCACCTGTGGCTC 9 00 
CGAGAACGTCGGGTGCCATCGGTCAGGTTGGGGTAGAGGTGGACACCGAG 

CACCACCGACAAGCACAGGTAGTCGGGCCAGCAAGGTGCGAGTGGCACCA 9 50 
GTGGTGGCTGTTCGTGTCCATCAGCCCGGTCGTTCCACGCTCACCGTGGT 

CTGGCTCCGTCACCTGCCGCTATGGAAGTCCAGGGCAAGGTACCTATCAA 1000 
GACCGAGGCAGTGGACGGCGATACCTTCAGGTCCCGTTCCATGGATAGTT 

CCGGGTTCAACCCAAGGTGAAGGAAGTAAAGCGCTCGGCCCACAACGCCA 1050 
GGCCCAAGTTGGGTTCCACTTCCTTCATTTCGCGAGCCGGGTGTTGCGGT 

TCGAGCGGCGCTATCGCACCTCAATCAACGACAAGATTAACGAGTTGAAG 110 0 
AGCTCGCCGCGATAGCGTGGAGTTAGTTGCTGTTCTAATTGCTCAACTTC 

AACTTGGTAGTGGGAGAGCAGGCCAAGCTGAACAAGTCCGCAGTGTTGCG 115 0 
TTGAACCATCACCCTCTCGTCCGGTTCGACTTGTTCAGGCGTCACAACGC 

GAAATCCATAGACAAGATTCGGGATCTGCAACGCCAGAATCACGATCTGA 12 00 
C TTTAGGT ATC TGTTCTAAGC C C T AG ACGTTGCGGTCTT AGTGC TAG ACT 

AGGCAGAGTTGCAGCGCCTGC AGAGGGAGCTAATGGCACGCGACGGCTCC 12 50 
TCCGTCTCAACGTCGCGGACGTCTCCCTCGATTACCGTGCGCTGCCGAGG 

AAGGTGAAGGATTTACTTC AGCTGGGCACTCGGCCTGGTAGAGC ATCC AA 1300 
TTCCACTTCCTAAATGAAGTCGACCCGTGAGCCGGACCATCTCGTAGGTT 

GAAGCGCC GCG AGAGC TCGC AGAC C TTTAC C ACGGATGC C GG AC TG ACGC 1350 
CTTCGCGGCGCTCTCGAGCGTCTGGAAATGGTGCCTACGGCCTGACTGCG 

CGCCACGCAGCGATGAATCGGATCCTTCGCTCTCGCCC ATGC ACTCGGAC 1400 
GCGGTGCGTCGCTACTTAGCCTAGGAAGCGAGAGCGGGTACGTGAGCCTG 

ATCTCGTTGCCGCCATCACCCTATGGTGGATCCACCGCCAGCTGTAGCAG 1450 
TAGAGCAACGGCGGTAGTGGGATACCACCTAGGTGGCGGTCGACATCGTC 

TGGCAGCAGCAGCAGCAATGAAGAACCACTGGTGGTGCCCAGCTCTATGC 1500 
ACCGTCGTCGTCGTCGTTACTTCTTGGTGACCACCACGGGTCGAGATACG 

GCGGCATGGCCACCCACTCTCGCCTCGGACTCTGCATGTTTATGTTCGCC 1550 
CGCCGTACCGGTGGGTGAGAGCGGAGCCTGAGACGTACAAATACAAGCGG 

ATCCTGGCCGTCAATCCCTTCAAGACCTTTCTCCAGCGCGGCCACTATGA 1600 
TAGGACCGGCAGTTAGGGAAGTTCTGGAAAGAGGTCGCGCCGGTGATACT 
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CAGTAATGACGATCTTGGCGACATGAGCGGTCAAAGACGCATTCTCTCTT 1650 
GTC ATTACTGC TAGAACCGCTGTACTCGCC AGTTTCTGC GTAAGAGAGAA 

AC G AC GTGGAAGGTGAAGGTTTTGCTGTC TGGC AGC AGAGTTC CTGGATA 17 00 
TGCTGCACCTTCCACTTCCAAAACGACAGACCGTCGTCTCAAGGACCTAT 

TGGCTATTGAACTTCACACTGATGCTTGGATGCTTGGTGAAATTGCTGGT 1750 
ACCGATT^ACTTGAAGTGTGACTACGAACCTACGAACCACTTTAACGACCA 

TTACGGTGATCCGCAGCTGGACGCGCAAACGGACGCCTACTGCCAGCACA 1800 
AATGCCACTAGGCGTCGACCTGCGCGTTTGCCTGCGGATGACGGTCGTGT 

GGCAGCGGGCTGACTTCTATTTTAGCCAAGGACAGTCGTCTCAGGCCTAC 1850 
CCGTCGCCCGACTGAAGATAAAATCGGTTCCTGTCAGCAGAGTCCGGATG 

GCCGGTTACCTC AACTGTCTGCATATGTTTGGATTAAGTCTACCGGCGTC 1900 
CGGCCAATGGAGTTGACAGACGTATACAAACCTAATTCAGATGGCCGCAG 

GCGCTTGGAGTGTTACTTGCAGACCACGTGGCAGTTCCTTCGTTTTCTTT 1950 
CGCGAACCTCACAATGAACGTCTGGTGCACCGTCAAGGAAGCAAAAGAAA 

TCCATCGCCTCTGGCTGGGTCGGGTGCTGTCACGGCGGTCCGGTGGGCTG 2000 
AGGTAGCGGAGACCGACCCAGCCCACGACAGTGCCGCCAGGCCACCCGAC 

TTTAGCAACGCCGCCAGCAGGAAACAGGCGCTGGCATCTGCACGCGAACT 2050 
AAATCGTTGCGGCGGTCGTCCTTTGTCCGCGACCGTAGACGTGCGCTTGA 

GGCCCTGCTCTTCAACCGACTGAATC AATTGCAACTGACTGGAAATGGAA 2100 
CCGGGACGAGAAGTTGGCTGACTTAGTTAACGTTGACTGACCTTTACCTT 

GC CGCGGTGAC ATG AACGGC ATTATGATGGC AC TATTCGC AAGC AAC ATG 2150 
CGGCGCCACTGTACTTGCCGTAATACTACCGTGATAAGCGTTCGTTGTAC 

GCTGAAGTGGCGCACAATCTACTGACACCGCGCGAGACCATCTGCATCCA 2200 
CGACTTCACCGCGTGTTAGATGACTGTGGCGCGCTCTGGTAGACGTAGGT 

CGTAATGAC AGCGTTGCGAATGAAGCGCAGTGCCCCAAAATGGTTGCAAC 2250 
GCATTACTGTCGCAACGCTTACTTCGCGTCACGGGGTTTTACCAACGTTG 

AGTTCTTCGCCCGATACTACATGAGCCGGGCTCGTCAAGAGTGCGGTCGC 23 00 
TCAAGAAGCGGGCTATGATGTACTCGGCCCGAGCAGTTCTCACGCCAGCG 

ACTAGGGCCACCGAGCAAACGCAGGAGCTACGTTGGGCATTCACAGCCTA 2350 
TGATCCCGGTGGCTCGTTTGCGTCCTCGATGCAACCCGTAAGTGTCGGAT 

TGGATATCGCTACTGCGCCACGCACGTCTTCACGTACGATCTGAGCGACT 2400 
ACCTATAGCGATGACGCGGTGCGTGCAGAAGTGCATGCTAGACTCGCTGA 
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CCeGCGAGCAGGATGGATTCTTCACACGTCTTAGGAATCCATGTGATCCC 2450 
GGC CGCTCGTC CTAC CTAAG AAGTGTGC AGAATC C TTAGGTAC ACTAGGG 

GCTGCCCACGTCATTAAGCAATATCGAGAGCATTTGCTGTTTAAATCCAT 2 500 
CGACGGGTGCAGTAATTCGTTATAGCTCTCGTAAACGACAAATTTAGGTA 

TCAGTGTCTGGTAGGAGCGGGCCACAAATCGGGAGGCCTGCCCACATCTT 2 550 
AGTCACAGACCATCCTCGCCCGGTGTTTAGCCCTCCGGACGGGTGTAGAA 

CTGTCAGCGGAGAGGCGGAACAGTTGCAGCAACAGCAGCACAGCGGCACC 2 6 0 0 
GAC AGTCGC CTCTC C GCCTTGTC AAC GTCGTTGTCGTC GTGTC GC C GTGG 

ATTGTCAGCAATGTTCTTAAGTACACGTCCCTCCTTAAGGACACTCTCTG 2 650 
TAACAGTCGTTACAAGAATTCATGTGCAGGGAGGAATTCCTGTGAGAGAC 

GGCTGATGAGGATGAGCGGGATACAAACGTGGTGTGGTGGGCCGATGTTT 27 00 
CCGACTACTCCTACTCGCCCTATGTTTGCACCACACCACCCGGCTACAAA 

TGGAGACCGCAGTGCACTGGCTCCTTGGTGAAGACACGCTGGCCGAGCAA 2750 
ACCTCTGGCGTCACGTGACCGAGGAACCACTTCTGTGCGACCGGCTCGTT 

TTGTACGGCAGGATCAAGCAAATGCCCACGCAGCTGCAACAGTGCGGCGA 2 800 
AACATGCCGTCCTAGTTCGTTTACGGGTGCGTCGACGTTGTCACGCCGCT 

AAACG ATC ATC TGC C C AAGGC GC TGC ATGCTGTGC TGCG AGC T AAG ATG A 2 850 
TTTGCTAGTAGACGGGTTCCGCGACGTACGACACGACGCTCGATTCTACT 

TCTTACTAAAAAACAATGGCAACGCACTGGACAAAAGTCTCAAGCAATTG 2 9 00 
AGAATGATTTTTTGTTACCGTTGCGTGACCTGTTTTCAGAGTTCGTTAAC 

GTAAAC ATC CTC TGC GATGAGTC GAGTGTGGAGC TC C AAGAGTGCTTG AC 2 9 50 
CATTTGTAGGAGACGCTACTCAGCTCACACCTCGAGGTTCTCACGAACTG 

TGTCAACCGGATCACCGACGCCAAGGGTATAAAGCTGCTTTTCCAGTTGC 3 000 
ACAGTTGGCCTAGTGGCTGCGGTTCCCATATTTCGACGAAAAGGTCAACG 

TTACCTGCGATTGGCTGCTCGAAACTAGGACTGCTCTGTGGGAACTGGAA 3 050 
AATGGACGCTAACCGACGAGCTTTGATCCTGACGAGACACCCTTGACCTT 

C AC ATGAATATGGAGG ACGATGGC TTC TACC AAGTGC C AGGTG AAGTGCT 3100 
GTGTAC TTATACC TC CTGCTACCGAAGATGGTTC ACGGTC C ACTTC ACGA 

CGAGAAGTTC C AG AC CGATTTGAACTCGTTGC GC AAC ATTGTGGAGAATA 3150 
GC TC TTC AAGGTCTGGCTAAACTTGAGC AACGCGTTGTAAC ACCTC TTAT 

TACCGAACGCCCAATCGCGCATATATTTGTACGAGGCAGTTTGTCGCCTG 32 00 
ATGGCTTGCGGGTTAGCGCGTATATAAACATGCTCCGTCAAACAGCGGAC 
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ATGGCTGGAGCCTCACCGTGTCCAACGCAACAGCTCTTGGACAGGAGTCT 3250 
TACCGACCTCGGAGTGGCACAGGTTGCGTTGTCGAGAACCTGTCCTCAGA 

GCGATCACGCAACGCCCACTCGTCCATCTTCTGCGGCAGCAAGGATCGGC 33 00 
CGCTAGTGCGTTGCGGGTGAGCAGGTAGAAGACGCCGTCGTTCCTAGCCG 

GGCAGCAGAACTTCGTGGGCGGAGAGCGGGAACGGGCTTCGGCCATGTAC 3 3 5 0 
C CGTC GTC TTG AAGC AC C CGCC TC TC GC C CTTGCC C G AAGC C GGT AC ATG 

GTGGCCTGCAAGTATCTCCCGCCTGCGCTGCTCAGCTCCCCGGGTGAACG 3400 
C AC C GGAC GTTC ATAGAGGGC GG AC GC G ACG AGTCG AGGGGC C C AC TTGC 

TGCTGGCATGTTAGCCGAGGCGGCCAAGACCCTGGAGAAGGTGGGCGACA 3450 
ACGACCGTACAATCGGCTCCGCCGGTTCTGGGACCTCTTCCACCCGCTGT 

AGC GAAAGCTC AAGGAGTGC T AC C AGC TGATG AAGTC GC TGGGC AAC GGC 3 50 0 
TCGCTTTCGAGTTCCTCACGATGGTCGACTACTTCAGCGACCCGTTGCCG 

ATTGGC AGCGTGAAGGCTTAGGATAGTAGTGAAGTAC ATAATAAGTGGCA 3 550 
TAACCGTCGCACTTCCGAATCCTATCATCACTTCATGTATTATTCACCGT 

CGAACGTGGTGTGGATTTTCAGCAAATGAATACCCGTTTGCTATTCAAAA 3 60 0 
GC TTGC AC C AC AC C T AAAAGTCGTTT AC TTATGGGC AAAC GAT AAGTTTT 

GAATTACAAATGCCTAGGTCTTTATAATTACGCTATTCCTCTGTTTTCCA 3 650 
CTTAATGTTTACGGATCCAGAAATATTAATGCGATAAGGAGACAAAAGGT 

C GC C CGGTTATGCTTAG ATTGTAATTTT AAAATTATTTAAT ATGGAC ATT 3700 
GCGGGCCAATACGAATCTAACATTAAAATTTTAATAAATTATACCTGTAA 

TTATTTGTTTATTATTTACCGTACTTGTTAAACGTATTTATAACAATAAA 37 5 0 
AATAAACAAATAATAAATGGCATGAACAATTTGCATAAATATTGTTATTT 

TATTTTAACAGATTTAAA 
ATAAAATTGTCTAAATTT 
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